Visual Basic 6 Client-Server Programming Gold Book
Visual Basic 6 Client-Server Programming Gold Book
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
Introduction
-----------
What's On The CD-ROM About the Authors Part IBasics Of Client/Server Development With Visual Basic 6 Chapter 1An Introduction To Client/Server And Networks
A History Of Files What The Heck Is Client/Server? Designing The Client/Server System Network Topologies And Architectures Network Protocols Network Operating Systems Where To Go From Here
Database Design Data Definition Language Data Control Language Where To Go From Here
Part IIVisual Basic 6 Database Programming Chapter 5Data Access Objects (DAO)
DAO Object Models DAO Objects The Data Control Where To Go From Here
Part IIIVisual Basic 6 And The Internet Chapter 12The ABCs Of XML
Regaining Context With XML
Leveraging The MS XML API Describing Your Data: The DTD Building An XML Application Where To Go From Here
Part IVAppendixes Appendix ACreating The Sample Database Appendix BDifferences Between Jet SQL And ANSI SQL Appendix CODBC Functions
Index
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
Table of Contents
-----------
Introduction I have a confession to make. As recently as a year ago, Visual Basic was not
my preferred tool to develop client/server applications. I have been developing database applications for two decades and have used many languages and tools. While I will never claim to know it all, I have a good sense for what works in the real world and what doesnt work. As far as client/server development is concerned, Visual Basic 5 seemed to me to be right on the edge as far as being a viable tool for large applications. Visual Basic 6 has changed my attitude. The client/server developer needs a development tool, not a language. The tool needs to be robust, needs to support a myriad of different backends (data sources) and needs to be able to produce a variety of different types of applications from single-user desktop solutions to multi-tiered applications deployed on the Internet. Visual Basic 6 is a superb tool adaptable to those types of projects and more. VB 6 is not best of breed in all categories. Weaknesses remain in areas such as object-orientation and data modeling. The tools that are there sometimes lack a little in how well they are integrated. Even still, VB6 is todays best choice for rapid application development of data driven applications in a client/server environment. In this book, my co-author Kurt Cagle and I have gleaned what works and what doesnt from months of work with VB6 betas using a variety of backends and application platforms. Between us, we have almost four decades of real-world experience. We have put together a book that is neither a rehash of the Help files nor a pie-in-the-sky Microsoft marketing brochure. We have assembled a guide to the development of client/server applications that will scale as you need and that will comfortably accommodate the rapid changes in
technology. We have kept our eye to the target audience, the experienced Visual Basic developer, while acknowledging that experienced means many different things in many different environments. We do not insult the readers intelligence with blow-by-blow details of using the Application Wizard, for instance. But, we also do not assume that the reader has previously built large-scale, multi-tiered, client/server systems. In fact, very few VB developers have built such applications. So, we lay the groundwork as we incrementally build the core knowledge from chapter to chapter. Chapters 1 through 3 concentrate on the environment within which the Visual Basic client/server application will be running. We discuss the whys and therefores of client/server in terms of the network and the database. We discuss and summarize network issues and database platform issues. We show you how to intelligently design a database and how to use SQL. In Chapter 4, we cover the gamut of Visual Basic 6 database access techniques, including DAO, RDO, ODBC, VBSQL, and ADO. We offer specific advice on when to use what as well as our opinions on the relative merits (and de-merits) of each approach. Chapters 5 and 6 cover development using DAO and RDO respectively. While most developers will eventually move to ADO, both DAO and RDO will be around for some time to come. Chapter 7 introduces ADO and OLE DB. We then show you how to use these new tools, taking the time to compare and contrast with the more familiar DAO and RDO models. In Chapter 8 we guide you through the conversion of existing DAO and RDO projects to ADO. We use as scenarios example projects built in Chapters 5 and 6 and provide detailed step-by-step guidance. Again, we provide this guidance in a real-mode manner, resorting to brute-force techniques when such techniques work best. Chapter 9 gets into more advanced and efficient techniques using ADO and OLE DB. We show you some of the newer tools with detailed examples. We discuss the DataEnvironment object, the DataRepeater object, the Format object, and so on. We make no bones where things are a little rough around the edges, showing you how to get around the flaws in hierarchical data presentations and similar gotchas. Chapter 10 is where we start hammering away at solid object-oriented development techniques instead of just paying lip service to them. We detail the advantages and disadvantages of different approaches to object-oriented development using classes. We discuss the efficiencies and inefficiencies in different binding techniques. From there, we develop business objects from classes that act as data providers to different objects, optimizing each for the circumstances under which they are deployed. We show you critical gotchas when moving the local business object to a remote platform and, frankly, show you how to plain make it work. Chapter 11 takes advantage of the knowledge assembled in the first ten chapters to explore advanced database techniques, including the creation and use of stored procedures and triggers. We deliberately placed this chapter at this point in the book both because these techniques help to solidify the traditional client/server application and because the techniques are absolutely
critical to the success of Web-based applications, which we discuss in the remainder of the book. Because so much is said about stored procedures and triggers, and because so few shops actually make effective use of them, we explore unusual uses of them both in terms of data validation and in the creation of a self-referencing data dictionary. And that brings us to the last third of the book: VB6 and the Internet. While client/server applications are, by definition, network applications, the Internet is the mother of all networks. The Internet connects 79 million people in the United States alone. With servers in excess of 10 million and Web pages in excess of 300 million, the Internet only continues to accelerate its growth. The astonishing thing is each of those 79 million plus people are all connected to one another and to many millions more around the globe, each one redefining what we mean by client/server programming. As cable modems and DSL lines begin to replace 28Kbps and 56Kbps analog modems, the number of people and uses for the Internet will jump astronomically as will the expectations of users. Client/server programming is all about getting data to and from the user. Visual Basic 6 dramatically redefines the boundaries between client and server, between data and user, offering a combination of one of the best RAD development environments on the planet with all the power of Active Server Pages and Dynamic HTML. In essence, with Visual Basic 6 you can create sophisticated Web applications targeted to the widest possible audience using the same tools that you have already mastered for other client/server development. The final four chapters of this book focus on this new technology, with an in-depth look at Internet Information Services applications and Dynamic HTML applications. Additionally, one of the hottest topics in data communications, Extensible Markup Language (also known as XML), is explored in detail, showing how you can take advantage of this new data standard in your own programs. Finally, this book looks at several useful technologies for the Internet-savvy database engineer, including remote component servers, client- and server-side scriptlets, data persistence, and more. There is no right way or wrong way to use this book. (Well, using it as a door stop or as something to raise your monitor probably isnt the best use we could imagine.) The experienced client/server developer may skim or skip over the first three chapters. Those readers new to client/server may wish to spend extra time on those chapters. Those readers that are making a commitment to ADO should at least read Chapters 7 and 9 before skipping ahead to the fun stuff (Web development). Those looking to check out the viability of Web-based development may well want to skip ahead to Chapter 12 before coming back to other portions of the book. We have tried to include at the front of each chapter some keywords to highlight topics to be covered and have tried to include at the end of each chapter some suggestions for where you may want to go next. Also be sure to check out the code samples on the enclosed CD-ROM. Several
people checked each piece of code. Although we have strived to make sure that every application works properly, the nature of the material is such that it was not possible to test on every single combination of platforms. As an example, you may need to make minor alterations in SQL syntax depending on what database you are connected to. Please pay particular attention to the fact that you will need to do some setup on your own system. Many of the examples use an ODBC data sourceafter creating the database (we have included the SQL statements to do so on the CD-ROM), you will need to set up the ODBC data source in the Control Panel. The paths to the data in some of the examples may need to be changed to reflect your paths. Also, pay particular attention to the fact that some samples connect to server applications. Since these sever applications need to be registered (see Chapter 10 in particular), you will need to compile them, go to the References dialog box, and reselect the references. Otherwise, you will get mysterious error messages. For the Internet chapters, please note that many of the examples were done using Internet Explorer version 5. You may wish to download that from the Microsoft Web site. We would have included it on the CD-ROM but it was in beta as this book was going to press and a more current release will surely be posted before you read this. You can use Access for most of the examples from this book if you want to experiment at home or otherwise do not have access to a relational database. You can also download trial versions of a number of different good single-user database engines. Sybase SQL Anywhere (renamed Sybase Adaptive Server Anywhere as this book was going to press) is an easy-to-set-up and use product available from www.sybase.com/products/anywhere/index.html for a free 60-day trial. If you like the book, please buy a couple dozen more. You can give them to your kids, your spouse, your mother (she will be impressed and thankful) and your boss (he or she will also be impressed and will give you a raise). Coriolis always invites comments, suggestions, criticisms, and so on. Visit their Web page at www.coriolis.com to find contact information or to check out any updates to the book. Thank you, Michael MacDonald ([email protected]) Kurt Cagle ([email protected])
Table of Contents
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
Table of Contents
-----------
and sell through your Web site. (Trial version) Suprasoft Crystal Reports Crystal Design ComponentA custom component that encapsulates the previewing of Crystal Reports reports inside of an easy-to-use control. This is an efficient and productive building block to create a unique and customized preview application or front end for your software. (Limited version)
Requirements:
Minimum of 486 or equivalent processor 16MB RAM Windows 95/98/NT Visual Basic 6
Table of Contents
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
Table of Contents
-----------
Acknowledgements
This is the area where I always get to thank everyone who so much as returned a phone call. But, there were some very special people behind this book. I am especially indebted and grateful to the publisher itself, Coriolis Group Books, for allowing the extra time to produce a book that was solid instead of a book that was early to market. It says a lot about the integrity of an organization that puts quality ahead of profits. I want to thank the Acquisitions Editor for this project, Stephanie Wall, for giving me the opportunity to do this book and for a lot of cheeriness and professionalism. Toni Zuccarini was the project editor, which is a job that does not get near enough thanks or recognition. Toni coordinated a myriad of details with grace and aplomb. John Lueders was the technical editor and painfully (to me as well as to him I am sure) reviewed every line of code in the book and on the CD-ROM, offering advice and criticisms, and did it well. Kristine Simmons was the copy editor for the project. Copy editors are those people who review manuscripts for clarity and
grammatical correctness. Kris offered many fine suggestions and untangled sentences and thoughts far better than I ever could have. I have worked with many copy editors and (I mean this in all sincerity) Kris is probably the best. I have to acknowledge the contributions of my co-author, Kurt Cagle, who stepped up to bat to handle Chapters 12 through 15 and really did a fantastic job. There are many people who go into the creation of a book who work behind the scenes without much in the way of recognition: Alan Pratt is the Technology Director and keeps, well, technical things moving between Coriolis and authors; Robert Clarfield coordinates a blizzard of details in getting that CD-ROM into the back of this book; Wendy Littley is the Production Coordinator and is in charge of getting the text laid out, getting proofs, coordinating among a zillion people, and so on; Tony Stock designed the cover of this book (and you thought I did that myself!) invisibly but professionally; and there are many others. To all of themthanks. I need a vacation!
Dedication
As always, to my familymy wife, Patricia, and my children, Amanda and Peterwho are the ones that truly make the sacrifices when I work past midnight. Also, to some very special colleagues who it has been my privilege to work with for more than a decade, including Larry Altrich, Steve Remmes, and Randy Vance. And finally, to a person who took a neophyte under his wing almost 20 years ago, Dr. Thomas Gross. Michael MacDonald To my parents, who taught me that humor and compassion will defeat arrogance and stupidity every time. Kurt Cagle
Table of Contents
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Part I Basics Of Client/Server Development With Visual Basic 6 Chapter 1 An Introduction To Client/Server And Networks
Key Topics: How we arrived at the relational data model Distributed computing The definition of client/server 2-tiered, 3-tiered, and n-tiered architectures The Microsoft Services model COM/DCOM and the Component model Application partitioning Network topologies and architectures
In this chapter, I introduce the underpinnings of client/server and how it evolved from the distributed data model of the 1970s and 1980s to the multi-tiered model of today. For some readers, this material is old hat; skim it or move on to the next chapter. For others, however, the material provides real-world background upon which to set in context the remainder of the book. The material is as applicable to development with any language as it is to development with Visual Basic (VB). Organizationally, this chapter is far ranging in the topics covered. I discuss the evolution of file formats and how they led to the relational model of today and the object model of tomorrow, as well as how these files evolved to support the distributed data format that was a direct ancestor of client/server today. I define basic and advanced client/server architectures, and I conclude with a review of the network topologies and architectures upon which client/server is based.
A History Of Files
Since the inception of the computer, there have really been only three file types: sequential, navigational, and relational. We are now seeing the introduction of a fourth type, the object model, along with relational/object hybrids. Concomitant with this evolution has been a shift of responsibility from the application to the database. For instance, in the sequential model, the application was responsible for the location of any given record as well as for the enforcement of all business rules. In the following sections, I review the evolution of files; in other sections of the chapter, I point out how this evolution mirrors the move to client/server development.
Sequential Model
Sequential files consist of fixed-length records, each of which typically contains fixed-length fields. The 80-column cards of old were essentially sequential files as were (and are) files stored on magnetic tape. A variation on the sequential model, text files, evolved when computer languages evolved to the point where they could use a character (such as a carriage return) as a record delimiter instead of depending on the end of a record being in a fixed location. Files such as AUTOEXEC.BAT and CONFIG.SYS are text files. Visual Basic supports both forms of sequential files, using the Open keyword to open the file, Close to close the file, and a variety of keywords to read from and write to the file depending on the nature of the files use. I review these in Chapter 4. The key advantages of the sequential file model are simplicity of programming file access and the speed with which an entire file can be processed. On the other hand, it is not possible with sequential files to locate any one record without first reading all other records before it. For instance, to pull up John Smiths customer record, it is necessary for the program to read the first record in the file and determine whether it is the correct one. If it is not, the program must read the next record and so on until the desired customer is found. As a result, the sequential file model also does not support any type of ad hoc
reporting capabilities. The application program is responsible for all file I/O, for all record searches and updates, and for all data integrity measures (such as ensuring that an order has a valid customer). Some improvements were realized by using random access techniques. In theory, if your application knew that John Smiths record was the 318th record in the file, the program could move directly to that record without reading the first 317 records. The caveat, of course, is that the application was then responsible for creating and maintaining some sort of index into the filenot an easy piece of logic to code. Programming languages had to evolve to support this as Visual Basic does when files are opened in Random mode. Random access requires that every record be fixed length. A variation is Binary mode where, instead of accessing a record by specifying its record number, the program uses a byte offset from the beginning of the file.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Navigational Model
The navigational file modelsometimes called the hierarchical modelwas a major improvement. The term stems from the fact that it was usually necessary to navigate to a desired record by first accessing other records. Often, the details of this navigation were hidden from the developer so that the database did the dirty work of locating a specific record. Common examples were dBase and Btrieve files. On the mainframe, the navigation model was implemented with Virtual Sequential Access Method (VSAM). Visual Basic supports the navigational model in two ways. Using a file opened in random access mode, you can manually maintain some sort of hashing algorithm to navigate from record to record. Often, this method is implemented in a manner similar to what C programmers know as linked lists. A linked list essentially maintains a series of pointers into a structure or file allowing navigation by (for example) account number or customer name. Visual Basic also allows connection to navigational databases via the Microsoft Jet engine. Although the interface provided is SQL, the underlying file is still accessed navigationally. Microsoft refers to this as Indexed Sequential Access Method (ISAM).
Relational Model
The relational model organizes data into related tables. A table is a two-dimensional grid of columns and rows where each row is a record and each column is a field in that record. Each table has a unique identifier, called a primary key, which allows the retrieval of any one row without referring to any other row. An example of a primary key might be customer number. Tables are related by common information. An Orders table would be related to a Customer table via a common customer number column: Each table has a column containing customer number. The relationship is enforced via a foreign key. The foreign key is a special type of index on the database that
enforces a business rule such as no order may exist without a valid customer number. This enforcement is called referential integrity. The database typically sports other methods of maintaining data integrity, such as restricting the value of gender to male or female. These enforced rules are called constraints. There are a number of vendors of relational database management systems (RDBMSs), including Oracle, Informix, Sybase, Microsoft, IBM, and so on. All of the RDBMS engines use SQL as a language for accessing and manipulating data. The majority of this book is devoted to using Visual Basic in an environment with an RDBMS such as Oracle, including techniques for maximizing performance of the database as well as maximizing developer productivity. The next two chapters introduce RDBMSs and the use of SQL.
Distributed Computing
To understand the move toward client/server, it is necessary to digress briefly to the problems inherent in the centralized processing models of the mainframes of the 1960s and 1970s. All data, along with the application programs written to manipulate that data, resided within the confines of a centralized computer. For a company with a single office in Boston or Chicago, this arrangement was fine. However, for a company with offices around the country or around the globe, this created practical problems: If the home office (and the computer system) was in Chicago, office workers in Boston and Los Angeles had no practical access to data relevant to their operations. The concept of the Distributed Computing Environment (DCE) sought to mitigate this problem by moving data closer to users on multiple machines. This solution had the added benefit of spreading processing loads to multiple machines. On the other hand, it was difficult to update information that was spread over multiple computers. It was also difficult to merge the data back together for reporting. As networked computers evolved, so did the proliferation of DCE. Networks were (and are) a natural platform on which to distribute data. Contrary to popular belief, networks have been around almost as long as computers. In 1964, the United States government contracted with the Rand Corporation to design a network that could continue communicating even if a portion of the network was destroyed in a disaster. The result, ARPANET, came online in 1968 and eventually evolved to what is known today as the Internet. Networks stretched the definition of DCE by adding the concept of sharing resources, such as printers and other expensive hardware. As the expense of networks decreased and capabilities increased throughout the late 1980s, the network exploded into prominence in corporate America. With the increased use of networks, sharing data on file servers was a natural happenstance. This development, however, revealed the weakness of networks: The bottleneck on nearly any network was and is the low bandwidththe amount of data that can be moved over the cabling. As an
example, a high-speed Ethernet network can theoretically move data at 100Mbps (megabits per second). In reality, the wire (the network cabling) is shared by many users, so even though the theoretical throughput sounds high, it dwindles as more users are added to the network. Worse, the wire can only send one piece of data at a time. Therefore, as more users are added, the potential for two users attempting to send data at the same time increases. When this happens, a data collision results. Any time there is a collision, the data has to be retransmitted. Thus, as network utilization increases, performance degrades very rapidly. Client/server solutions seek to take performance issues into account by minimizing the amount of data being moved.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Figure 1.1 In a two-tiered, client/server application, the client requests records from the database server, which processes the request and returns only those records that meet the criterion. Contrast this setup with just a shared file on a server. (I discuss this concept further in Chapter 2 where I define the term RDBMS engine. With a shared file, there is no separate processprogramrunning. The file, perhaps an Access database, is merely shared by multiple users.) Lets assume that we have 10,000 customers for whom we store information. We want to find all customers who live in California. Suppose 50 customers are from the Golden State. With a shared file, the program has to read the entire file and determine
which customers are from the state of California. To do that, all 10,000 records are sent from the server to the client. The user is going to wait a very long time for a response. In a client/server environment, the client creates a SQL statement such as Select * From Customer Where State = CA. The request is sent to the database server. The server then evaluates the request and finds the 50 records before sending any data back to the client. Because only 50 records are sent to the client (instead of 10,000), network traffic is reduced dramatically. The simplistic model that I have just illustrated works fairly well when the number of simultaneous userssuch as in a departmental applicationis not too high. However, it begins to sag when we try to scale (grow) it to the enterprise level. In the sections that follow, I address this issue as well.
Multitiered Architectures
With the terms two-tiered, three-tiered, or even n-tiered architectures, we are referring to the number of layers of communications that make up an application. Each term represents a division of duties. If you write an application program that communicates with a database server, you construct a two-tiered application. The database is primarily responsible for returning data to the client. The client is primarily responsible for providing the user interface and the business logic. This structure was shown in Figure 1.1. There are a number of problems with this approach. Although a goal of client/server is to minimize network traffic, the fact remains that if you add enough users, the network will still be crushed. For instance, the database itself can support only so many active, simultaneous connections before it begins to sag performance-wise, simply due to a lack of resources (memory and so on). Also, you tend to reach a point where the network traffic still becomes too high. If your multiuser FoxPro system could support 5 or 10 users, the two-tiered client/server equivalent could probably support 50 or 100 (because the client/server approach puts much less strain on network resources than does the shared-file approach used in a FoxPro application). Beyond those 50 to 100 users, however, the network still cant handle the amount of data being moved around. Other complications come from the sheer complexity of maintaining that many clients. Assume you do manage to get 1,000 users on your client/server application. Each time the application needs to be maintained, all 1,000 PCs must be updated. That is a Herculean task for which no good administrative tools exist. Further, todays graphic-heavy environments and intensive computations require ever bigger and faster PCs, creating an endless cycle of expensive PC upgrades and replacements. Therefore, the obvious answer is to offload as much processing as possible to an application server. In this scenario, the client application is mainly concerned with presenting a user interface and does as little business logic as possible. The business logic is moved to a server that performs business-oriented processing. The application server manages connections to the database and performs all data requests of the database. The database also performs a limited amount of logic by
enforcing business rules on the server. This three-tiered model is illustrated in Figure 1.2.
Figure 1.2 In this three-tiered system, the client connects to the application server, which in turn connects to the database server.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
A more familiar example of a three-tiered application might be what you see on the Internet. Visit a site such as www.amazon.com (an online bookstore) and search for books with Visual Basic in their titles. Your PC is the client in this model with the Web browser containing the user interface. It connects to a Web server at Amazons site. The Web servers main job is to create Hypertext Markup Language (HTML) pages to be sent to the Web browser on the client. Extensions on the Web server perform some business logic and interact with a database server. The Web server sends a request to the database server asking for all titles that match your search criterion. The database server processes the request and sends the result back to the Web server. The Web server then formats the result set into an HTML page and sends that back to the client for presentation within the Web browser. Beyond overcoming practical limitations imposed by the two-tiered design, a multi-tiered design offers other advantages that exploit the location and nature of each physical tier. Figure 1.3 shows a group of PCs networked together with the application server on the same ring. Notice that the client PCs can communicate with the application server, but only the application server communicates with the database server. This arrangement has the effect of reducing a certain amount of data traffic on the network because the database server isnt even on the network. Thus, PCs on the network that are not part of the client/server application itself suffer minimal impact on network performance.
Figure 1.3 A three-tiered configuration with the client computers communicating with the application server via the network.
COM automatically knows how to communicate with the object and can take advantage of its methods and properties.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Distributed Component Object Model (DCOM) technology extends this concept by allowing a component to be placed anywhere on a network. Any other object can communicate with it without knowing or caring where it is on the network. Assume you have a client/server application that uses the Internet as its network. Your Visual Basic application can invoke methods of custUpdate (without any change in coding) regardless of whether it is located on your computer or on a server on the other side of the world. You can take advantage of DCOM by deploying all your business logic on an application server. Instead of having a custUpdate object on every client in the network, the object is located only at the server. The object operates independently of any client and independently of the database server. Should a business rule change (perhaps you will allow a customer credit of up to $1,000 instead of $500), you have only one place to make the change and that change is instantly made available and enforced for all clients. By placing all your business logic on an application server, you have effectively partitioned the business services from the user services (and from the data services).
little more than present forms, menus, and the like to the user. All of a sudden, we are actually taking some of the strain off the client and fewer resources (such as memory, disk space, and CPU) are needed. The direction we are heading, then, is towards a thin client. Visual Basic 5 introduced the capability to create applications that actually run within a Web browser. No form or menu runs on the client; it is housed in the browser. The browser then becomes the user interface. If the user wants to access a client/server order entry system, he or she launches Internet Explorer (or Netscape Communicator). If he or she wants to access the accounting client/server application, he or she launches Internet Explorer. If the PC is powerful enough to run Internet Explorer, it is powerful enough to run any of the client/server applications. This method is how you implement a thin client. It means that the application is truly network independent. If the network supports TCP/IP, it supports your application. It also means that your application is operating system independent. If the client has a Web browser that supports active clients, it can run your application. Your Visual Basic application system is portable over multiple operating systems. Lest I sound like a walking billboard advertisement for Microsoft, let me be the first to say that the technology is not perfect. Can you do everything I just said in the preceding paragraph? Yes, sort of. You may be constrained by network limitations. (The Internet, for instance, has been known to get bogged down from time to time and that is beyond your control.) The range of database connectivity options and methodologies (a major subject in this book) is confusing to say the least. As the technology matures, it is probably wise to tread lightly and test carefully. Still, it is an exciting trend, and it will ultimately simplify development enormously.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Systems Analysis
Assume that we are building a traditional order entry system. The first step is to gather detailed requirements from users. We can use a number of well-established, formal methodologies (on which whole books have been written). Those experienced in the field often use a somewhat less formal approach. In this day and age of rapid application development (RAD), many companies turn to joint application development (JAD) brainstorming sessions. In these JAD meetings, users and technicians meet to hammer out first the functionality of a system and, normally, the data elements to be captured. The latter form the basis for the database design, which I review in Chapter 2. For instance, in an order entry system, typical functions will include the following: Add, update, and delete customer. Add, update, and delete order. It is useful to compare the functions to the database design in order to verify that each database entity has at least one occurrence of add, update, and delete (to ensure that there is a mechanism to add a customer). It is often helpful to bring senior management to at least the initial JAD meeting to clarify the organizations goals. It is not uncommon to encounter a sizable disjoint between what senior management sees as the organizations philosophy and what the rank and file sees. Management can provide a mission statement from which all functionality is derived: Sell widgets worldwide. The mission statement then can be viewed as the highest-level function of the organization. That function can be decomposed into smaller units that define how the overall mission is accomplished (see Figure 1.4).
Figure 1.4 The beginning of a simplified functional decomposition. Each function is decomposed further until an atomic-level function statement is achieved. A function that has been decomposed to the atomic level generally provides one and only one output. An example of an output is a report or a table being updated. (In practical terms, you may then opt to combine two functions to make the application more functional. For instance, it is common to maintain order headers and associated line items on the same form.) As you can see, you are not defining how the application will accomplish a task. Instead, at this stage, you are merely concerned with what the task is. Because of space constraints in this book, I cannot explain all of the ins and outs of the different approaches to systems analysis and design. I spend a good 40 hours teaching these skills to students in what is still a compressed format. However, I end up stressing that the whole process of systems design comes down to practicing common sense. Decomposing functionality is particularly troublesome to my students who invariably are looking for the one right answer. I have to tell them that there is no one right answer. If the decomposed functions accomplish the goals of the system, then it is correct. In general, there is a one-to-one correspondence between an atomic-level function and a program (such as a Visual Basic form or report). Once application functionality has been defined, you are ready to think in terms of how to make it happen. There are two aspects to this: defining the data that needs to be captured and a description of how the data will be manipulated. (In Chapter 2, I discuss how to design a database.)
Encapsulating Functionality
Another source of confusion is the concept of encapsulating functionality. All that this intimidating word (encapsulation) means is embedding functionality into another object such that its behavior is exposed but its integrity is protected. Okay, maybe that is not so easy to understand. Consider the database server. When you send an SQL statement to the server asking it to update the address of a customer, you are not doing the actual update, the server is. The database engine edits your SQL statement for validity, ensures that the customer number is valid, verifies the state or province and postal code, then performs the physical update. It places an entry in the transaction log and informs your program whether the update was successful. You cannot access the underlying data or functionality directly; it is encapsulated within the database server itself. In designing your system, you seek to do the same thingencapsulate your
application logic. If you need to perform customer maintenance, you seek to isolate the data and logic. Isolating the data allows you to guarantee the integrity of the data. For instance, in updating a customers address, you might want to validate the state or province code as well as the postal code. Therefore, you could embed that logic into an object. By embedding the data in that object, you force all requests to update the data through the objects methods. This embedding is a recurring theme throughout this book, and I expand upon this concept in the following section.
Figure 1.5 An application broken into three tiers of services. To build such an application, you need to create two (or more) Visual Basic applications. The client services portion is actually rather simple, needing only to provide an interface to the user and the knowledge of how to communicate with the business services tier. The business services portion is a separate program running remotely (from the client) to which multiple clients connect. Visual Basic provides a rich set of tools to build these portions. However, the hard work has already been done for you.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Distributed COM
When you build an ActiveX component and then use it in a VB project, COM provides a proxy and a stub for each side of the connection between your application and the component. A proxy is an object provided by COM that runs in the same memory space as the user. It packages together parameters and acts as a bridge between the two objects. If your application is communicating with an ActiveX component, the proxy runs in the memory space of your application. It collects arguments to be passed to the ActiveX controls methods and properties and communicates with the components stub. The stub, on the other hand, runs in the address space of the receiving object, unpackages the parameters received from the proxy, and passes them to the component. This communication works both ways, as shown in Figure 1.6.
Figure 1.6 COM and DCOM automatically provide proxy and stub services, allowing two objects to communicate with one another with little to no intervention on the part of the developer. The workings of DCOM are almost identical. The application houses a remote automation proxy, which communicates with the components remote automation stub. DCOM also adds network transport services in a manner that is invisible to the application, making the location of the component on the network irrelevant to the client. This provides for tremendous flexibility in design. Visual Basic (Enterprise Edition) provides tools to register components. I review step-by-step in Chapter 10 the entire procedure for implementing the DCOM.
In the remainder of this chapter, I review some of the key concepts of the underlying network architecture. Although the client/server developer does not need to be a network engineer, it is helpful to understand the basics of the physical model upon which the application is based.
Peer-To-Peer Networks
A peer-to-peer network is one in which the computers are connected to one another directly in series. Each PC typically contains a LAN adapter or Network Interface Card (NIC) with two jacks. Typically, these jacks accept RJ-11 pins (similar to the connectors on your phone cord). A line is run from one computer to the next serially. Although it is simple to set up, such a network is typically slow because data passes through each computer. It is vulnerable to any individual workstation being powered down or crashing, and because the network has no server, there are minimal security constraints. A Windows 95 or Windows 98 network is a peer-to-peer network.
Star Networks
The star network topology implies a network where all computers are connected to a hub (a hub is a box similar to a telephone switchboard containing minimal intelligence). Multiple hubs are connected to one another. Star networks are simple to maintain but offer minimal security constraints. If a hub goes down, the entire network goes down.
Ring Networks
In a ring network, all computers are connected via a continuous cable. A multiplexing unit monitors the network, allowing only one packet to circle the network at a time. The packet is continually sent from computer to computer. Each computer examines the packet to see if it is addressed to that workstation. If a computer needs to send information, it waits for the packet and attaches a request to transmit. The multiplexing unit gives permission to each workstation in turn in a procedure that is akin to a parliament (giving each member a turn to speak but also allowing for higher priority speeches such as error messages). If the packet is accidentally destroyed, the multiplexer recreates it.
IBMs Token Ring architecture runs on a ring technology. It has the advantage of being very reliable, but because of the extra overhead of packet monitoring, it is somewhat expensive to maintain and does not scale to a large number of users.
Bus Networks
The bus topology connects computers in a single line of cable. Although each computer is typically connected to a hub, the internal wiring of the hub still connects each computer serially. When a workstation wants to send a packet of information, it listens for the line to be clear through a process known as electronic sensing. Only one packet can travel on the network at a time, and if two computers happen to transmit at the same time, a packet collision occurs. Each workstation listens for such collisions and, when they are detected, waits a random amount of time and then resends the packet. The most common implementation of bus technology is Ethernet. Although fairly simple and inexpensive to install and maintain, the network is vulnerable to any breaks in the cable. Further, as more users are added to the network, performance can degrade drastically. Still, Ethernet is by far the most common architecture. To maintain satisfactory performance, network engineers break a large Ethernet network into smaller LANs connected via bridges.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Network Protocols
In the 1980s, when we logged onto a bulletin board, we said the two computers were shaking handsagreeing on how to communicate with one another. Today, we more accurately refer to this process as agreeing to a protocol. A protocol is a low-level language with which two or more computers communicate. Imagine you send a 2MB file to a friend across the company or across the globe. The file is broken into packets. Each packet has various pieces of information in it: Destination addressThe computer to which the packet is being sent. Usually a from addressThe computer or person from which the file was sent. ChecksumA mechanism by which the receiving computer can verify the integrity of the data being sent. Sequence numberA number that enables the receiving computer to reassemble all the packets into one file and to tell the sending computer which packets to resend in the event of transmission errors. DataThe actual data being sent. A number of protocols are commonly used in organizations today. Most are based on the Open System Interconnection (OSI) Reference Model, which defines standard network layers of communications. An in-depth discussion of these protocols is beyond the scope of this book. However, protocols that adhere to this standard can generally intercommunicate with relatively inexpensive translation hardware. In general, you can think of a network protocol as similar to an agreement at the United Nations to speak in one certain language that everyone understands. The most common protocol in use is TCP/IP, the language of the Internet. The TCP (Transport Control Protocol) portion refers to how a packet moves from
application to application. The IP (Internet Protocol) refers to how data is moved from computer to computer. Although it is used on the Internet, TCP is widely implemented on various internal networks within organizations. An in-depth discussion of protocols is beyond the scope of this book. For now, understand that different networks employ different protocols and often employ more than one protocol simultaneously. IPX/SPX is the native protocol of NetWare networks. IBM LAN Managers native protocol is NetBIOS. Other protocols that you may run into range from NetBEUI (Microsoft Windows NT) to AppleTalk (Apple Macintosh).
interface of a Unix platform. In the following sections, I review the more common network operating systems that you are likely to encounter.
Novell NetWare
Once the king of the hill, NetWare has seen its market share steadily decline relative to the Microsoft Windows NT juggernaut in the past few years. Whether the declining market share is deserved is a subject for conjecture. NetWare is known as a dedicated operating system, meaning that the server can run no programs other than the operating system itself. Other programs you want to run must actually be part of the operating system. You accomplish this arrangement by compiling the other program (email or database are examples) into a NetWare Loadable Module (NLM). When the NOS loads, any NLMs load as well and become part of the operating system itself. There is a clear advantage to this system in that, because the NLMs are part of the operating system, they run much faster than they otherwise might. They can take advantage of operating system services (such as file I/O) without going through intermediate drivers and link libraries. On the other hand, if the application crashes or is otherwise poorly behaved, the operating system will also crash (because the application and the operating system are one and the same). A few years ago, Novell sought to expand its presence in the workplace with the acquisition of Quattro Pro and WordPerfect. Many in the industry believe this was a serious tactical blunder, removing Novells focus from the network arena, where it had always been the market leader, and allowing competitors, particularly Microsoft, to make inroads. Novell eventually sold those products to Corel, but by then, the damage had been done. Regardless, NetWare has a large presence in the corporate arena and a strong suite of tools. An ironic side benefit of NetWares market erosion is a relative abundance of talent to support the product set.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Banyan Vines
Banyans product has never garnered as large a market share as it probably should have, always operating in the shadow of other players in the marketplace. Like Novell NetWare, Vines is a dedicated operating system. Although it is a solid platform for file services, I am not aware of any major database products that run on it.
Windows NT
The Microsoft juggernaut seems to roll on with Windows NT. The operating system was developed to replace Microsoft LAN Manager, which was jointly developed with IBM in the 1980s. Like Windows itself, NT did not enjoy widespread acceptance for several years. NT version 3.51 seemed to have been the right answer for organizations that began to embrace it as a viable network platform. NT version 4 is a robust operating system, although arguably not as powerful or scalable as Novell NetWare 4.x. In particular, the word among network professionals is that Windows NT 4 seems to hit a wall at about 1,100 users. Likewise, its directory services are weak in comparison to Novell
NetWares. However, NT 5 (in beta as of this writing) seems to answer many of these complaints. Windows NT is a nondedicated server, meaning that other programs can run along with the NOS itself. Microsoft traded some resource intensiveness (Windows NT really needs more memory and CPU speed than most other NOSs to run comfortably) against a high degree of stability. The operating system spends a fair amount of CPU cycles enforcing integrity of operations. For instance, it does not permit any program to directly address any hardware, and it also prevents applications from writing to memory outside of their own address spaces. Windows NT sports a solid set of graphical, if somewhat complicated, tuning devices to maximize performance. Reportedly, Windows NT represented 50 percent of all network server licenses as of February 1998.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
In this chapter, I guide you through the essentials of what is the heart and soul
of any client/server system: the database. Although a Visual Basic program can run against almost any back-end database, it is the Relational Database Management System (RDBMS) that brings the server to client/server. When training students or corporate clients in client/server development techniques, I stress that properly designing a database is 80 percent of the battle in constructing and deploying a successful client/server application. With a well-designed database, you have a good chance of deploying a successful client/server application. On the other hand, if the database is poorly designed, all the development skills in the world cannot create a robust, reliable client/server application. Therefore, no skill is more important in client/server development than properly designing a database. A properly designed database accomplishes three essential things for the client/server developer:
Through a properly normalized data schema, complete with intelligently designed integrity constraints, the database can assume much of the burden of validating basic business rules, such as requiring that an order have a valid customer. By using the referential integrity tools made available by the database, you minimize or eliminate altogether the chances of data corruption problems. By moving data update, retrieval, and validation processing to the server, you enhance application performance by minimizing network traffic and by moving some of the intensive processing from the client to the server (which, presumably, is a high-end machine optimized for the load).
Any database that organizes data in related tables and that can enforce those relationships via referential integrity can lay claim to being a relational database. The database engine manages the data and enforces the integrity constraints. If there is no engine, the responsibility falls on the application program to perform those roles. Consider a Microsoft Access database that I call demo. It is unquestionably a relational database. However, assume I put the database file on a server and then access it from multiple client programs. You will see that there are actually two files: demo.mdb is the actual database file (MDB stands for Microsoft database file), and demo.ldb is a file used to manage record locking (the LDB is for lock database file). Now, suppose you want to see all the customers from California. Your application generates an SQL statement that looks something like: SELECT * FROM CUSTOMER WHERE STATE = 'CA' Assume there are 50 customers from California on a table of 5,000 customers. All 5,000 records are returned to your application, and the application itself then reads through those records to find the 50 that you need. The application receives 100 times as much data as required. It is easy to imagine how this activity bogs down network performance, especially if 10, 50, or 500 client programs all make the same types of requests. This inefficient data processing occurs because the Access database is just a file with no engine to process the data request. Contrast this example with one involving an RDBMS, such as Oracle or Microsoft SQL Server, which are true database engines. In this scenario, the SQL request is sent to the RDBMS, which processes the request on the server. The server sends back to your application only the 50 records that are required. Network traffic is reduced by 99 percent. The moral? A wolf in sheeps clothing is still a wolf. A database might be relational (such as Microsoft Access) or present an SQL front end (such as dBase), but if it is not a relational database engine, it is not an appropriate choice for client/server development.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
proper choice is highly dependent on each organizations needs and environment. Further, although I have, in fact, used most of the products that I discuss in the next section (A Survey Of Available RDBMSs), I have not used all of them, and any specific recommendations are likely to be somewhat subjective. Of course, your organization may have already deployed an RDBMS that is serving you quite well. In that case, get a cup of coffee and then move on to the next section.
thousands or millions of detail records that are queried and summarized but are not typically updated. Executive information systems (EIS) and data warehouses are examples of these types of applications. Either way, the needs are quite a bit different than those for an OLTP system. If you have a busy OLTP system, you might want to consider such factors as how a given database performs record locking. Sybase, for instance, uses what is called page-level lockingall records on a database page are locked when an update is to occur. Oracle, on the other hand, performs row-level lockingonly those rows that are to be updated get locked. The record locking has implications if several users are trying to update different records on the same page. With the Sybase product, the users have to wait until the first user has released the lock on that page. Microsoft SQL Server 6.0 performs page-level locking, whereas SQL Server 6.5 does pseudo row-level locking: The rows immediately before and after the row to be updated are also locked. SQL Server 7.0 performs true row-level locking. On the other hand, the nature of your data may be such that a page contains few rows, which makes the issue of row-level versus page-level locking less important. If you are planning a DSS, you should consider how well the database handles very large volumes of data. Most of the major vendors support some sort of variation on massively parallel processing (MPP), where a long-running query is executed in multiple threads to better handle the intensive querying and to speed up processing. Determining which database will best handle your needs for an OLTP or a DSS is not an easy undertaking. My recommendation is to talk to peers who have used the products in environments similar to your own. In addition, check out the computer press for its independent benchmarking results. Lastly, consider searching the Internet (but avoid the vendors own biased claims).
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Database Administration
My own subjective experience is that Oracles database engine is probably the most powerful in a high-transaction environment. However, this power comes at the expense of more complicated administration. To squeeze the best performance out of a complex database product such as Oracle, the database administrator (DBA) needs to monitor and refine a variety of operating parameters, such as defining the frequency of checkpoints, partitioning the data, and so on. (The particulars of database administration vary from database to database and are beyond the scope of this book.) How much administration is required also depends on the nature of the applications that the database supports in your own organization.
Although the good news is that most of the vendors (including Oracle) are simplifying administration, a poorly configured database yields poor performance. Consider Microsoft SQL Server running on a Windows NT server with 1GB of RAM. By default, SQL Server uses only the first 16MB of RAM. A low-end RDBMS product that uses most or all available memory performs much better than a high-end RDBMS that has not been configured to take advantage of the available resources.
require the database to work too hard to locate the desired data or force it to read through too many records to return the desired result set. The application data logicThe application (or user) can easily throttle performance with poorly phrased queries that do not, for instance, correctly specify table joins. Consider a department table that has 50 rows and an employee table that has 500 rows. A list of employees showing their departments should, of course, produce 500 records. If the tables are not properly joined, you might end up with 25,000 records instead! Now, consider a program that allows the selection of an employee for maintenance. Inexperienced developers invariably seek to present a drop-down list box containing all 500 employees. That means that each time the user wants to see one employee, the database must return 500hardly the purpose of client/server. Worse, what if the company has 5,000 employees? You can see where performance would, again, rapidly degrade through no fault of the RDBMS. The RDBMSThe RDBMS itself can easily be so badly maintained as to produce unbearable results when even minimal performance tuning might yield improvements on the order of a 90 percent reduction in response time. Study the tuning manual that comes with each database. Periodically reorganize the databasea task that is similar to defragmenting your hard drive. Ensure that the database is allowed to use all of the resources (memory, disk, and so on) available. Update the system statistics frequently (system tables maintained by the database that give the query optimizer information about the types and volumes of data that is stored on the tables).
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Database Cost
Although I personally hate to see the cost of an RDBMS become a major consideration, the reality is that some shops do have to live within certain monetary constraints. A database engine such as Oracle might run into a licensing fee of many thousands of dollars, whereas a product such as SQL Anywhere might cost only several hundred dollars. It should be noted than many vendors will wheel and deal on price negotiations. Be careful when comparing prices. A visit to Oracles Web site shows you that the licensing fee to run the RDBMS on a single server is $1,450 (at the time of this writing). The fine print near the bottom of the page reveals that this is for five concurrent users. Most organizations running Oracle have somewhat more than five users. When factoring costs, you need to consider the number of servers upon which the database is to run, the maximum number of concurrent users, the maintenance fee, and quite possibly a separate telephone support fee. The maintenance fee is often 12 to 18 percent of the licensing fee annually. Telephone support can easily run into thousands of dollars per year. A final consideration in database cost is the so-called cost-of-ownership term so often bandied about when discussing PCs. Above and beyond licensing, maintenance, and support costs, you might want to consider that the hardware requirements of the different RDBMS products differ (ignore the stated minimum requirements and concentrate on the recommended configuration) as do the internal support costs. A talented DBA can tremendously enhance the performance of the RDBMS (while commanding a salary that would make the corporate CEO blush).
You might want to consider the financial stability of the RDBMS vendor. As of this writing, Sybase was posting large losses and laying off 10 percent of the companys employees. There were also questions regarding Informixs financial results. Such news may be of concern in terms of a companys long-term viability (will they be in business five years from now?), the organizations ability to develop and improve the product, and its ability to support the product. (Please note that although Sybases woes cannot be construed as good news for any customers, I personally do not predict an organization that large will go under. The Sybase product set is excellent and the company continues to boast a top-notch technical development staff across a diverse product line.)
Microsoft Access
As discussed earlier in this chapter, Microsoft Access is not truly an RDBMS. However, it is a popular development back end and offers some fairly sophisticated features, such as true database replication. On the other hand, it does not offer a complete SQL implementation (see Appendix B for a comparison of Jet SQL to ANSI SQL), and because it is not truly client/server, its performance pales against even the least expensive of the true client/server database engines. Nevertheless, because it is popular and because Visual Basic offers easy support for Access, I use it in examples throughout the book. Access Version 8 is bundled with Microsoft Office 98 Professional and is available as a standalone product. However, you can use Visual Basic to write to and read from Access databases even if you dont own Access itself. For more information on Microsoft Access, visit the Microsoft Web site at www.microsoft.com/access/.
implementation, has a complete implementation of SQL (I discuss the ANSI SQL standard in Chapter 3), and offers full support for Transact-SQL. (T-SQL is a dialect of SQL used by Sybase SQL Server and Microsofts SQL Server 6.5.) The product requires almost no administration but does not scale to the enterprise level. The current version of SQL Anywhere (as of this writing) is 5.5. It runs as either a server or a standalone (single-user) RDBMS on Windows 95, Windows NT, Windows 3.x, OS/2, MS-DOS, and NetWare. For more information, visit Sybases Web site at www.sybase.com.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Oracle
Oracle Version 8 runs on a wide variety of platforms, including OS/2, NetWare, Windows NT, and Unix (including Sun Solaris, IBMs AIX, HP-UX, and SCO). A personal edition of Oracle runs on Windows 95/98. Oracle has long enjoyed a reputation as the most powerful database on the market with advanced features optimizing large volume data processing, distributed data management, Web connectivity, and robust OLTP. Still, the gap in processing capabilities between Oracle and its competitors has eroded and perhaps disappeared under certain circumstances. Although Oracle was once considered a pricey product, competition has driven the costs down. Oracle has been the number one vendor for many years. Whereas the two Sybase products offer extensions to SQL known as T-SQL, Oracle uses a robust dialect known as PL/SQL. As with T-SQL, I also present samples in this book using PL/SQL. For more information, visit the Oracle Web site at www.oracle.com.
Informix
For years, Informix was a distant runner-up in terms of sales to the two giants, Sybase and Oracle. In recent years, the Informix product line has grown in sales until it is now solidly entrenched as the number two vendor of RDBMSs. The Informix products scale well from the department to the enterprise level and offer well-integrated Web connectivity options. Informix sells versions of its enterprise server that run on most types of Unix and on Windows NT. For more information, visit the Informix site at www.infor-mix.com.
IBM DB2
IBMs relational database, DB2, runs on a wide variety of platforms, including most types of Unix, Windows NT, OS/2, and Windows 95/98, as well as VAX VMS and IBM MVS and VSE. Although in the past its network-level database products have not fared well in comparison to the products of other vendors, its latest version has tested well on Windows NT. DB2 is of particular interest to those organizations that need a high degree of compatibility or interoperability with their mainframe database. For more information, visit the IBM Web site at www.ibm.com.
XDB
XDB Systems offers a well-done RDBMS that offers a high degree of compatibility with IBMs DB2 and, like DB2 on the network, would be of interest to shops also running a mainframe. By itself, XDB is a capable RDBMS. In addition, it has extensions that enable the querying of DB2 and IMS databases and VSAM files on the mainframe (though VSAM files cannot be updated). My own experiences with XDB have been positive, although I feel it is probably not as powerful as the very top-end RDBMSs. I am no longer nervous about the companys long-term prospects because it has announced its acquisition by Micro Focus, makers of COBOL tools for the network. This is a logical pairing because Micro Focus tools have always
integrated well with XDB. The choice of XDB is appropriate for shops interested in leveraging legacy COBOL applications by migrating them to a client/server environment. For more information, visit XDBs site at www.xdb.com.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Database Organization
A relational database, unlike sequential files, organizes all of its data in a single file. Figure 2.1 illustrates a nonrelational and a relational design of a database with customers and orders. Both cases have four indexes: two on the customer data and two on the orders data. The nonrelational approach illustrates how you might build the database with dBase files: There are a total of six files with no true logical association between them. For instance, nothing stops me from opening the ORDERS.DBF file and adding an order even if no valid customer is associated with that order. Even worse, nothing stops me from exiting to DOS and typing DEL CUSTOMER.DBF. After doing so, I have two indexes serving no purpose and a whole lot of orders for which I have no clue about the customers.
Figure 2.1 Nonrelational databases are typically a collection of un-associated files. Relational databases contain all of their tables and indexes inside a single file. On the right side of Figure 2.1 is a single file managed by an RDBMS, such as Microsoft SQL Server, containing the two files and four indexes. These files are actually stored as tables. Tables and indexes are both stored as objects within the database.
Tables
A database table is organized into rows and columns much like a spreadsheet.
The columns correspond to fields in a sequential file. The rows correspond to records. Figure 2.2 illustrates this with a simple implementation of a Customer table. Cust_No represents a column, whereas John Smith represents a row in a two-dimensional grid.
Figure 2.2 A relational database table is organized into columns and rows. Every row in a table must be able to be uniquely identified. In other words, there must exist something on each row in a table that is different from any other row on that table. This aspect that makes a row unique is called a primary key. On the Customer table, the Cust_No column is the primary key; there can be one and only one customer 101, one and only one customer 102, and so on.
Figure 2.3 The second table, Orders, is related to the first table, Customer,
Table Relationships
The RDBMS enforces the rule that the customer number on the order must already exist on the Customer table through the use of referential integrity. Specifically, you can create a foreign key on the Orders table that states that the Cust_No column must relate to a key on the Customer table. In general, the foreign key specifies that the values of a column (or columns) on a dependent table must exist uniquely on a master table. The Orders table is dependent on the Customer table because values on it must already exist in the master table. Sometimes this relationship is called child-parent. Figure 2.4 shows the relationship between the Customer table and the Orders table with the foreign key.
Figure 2.4 The Customer and Orders tables with primary keys and a foreign key enforcing their relationship.
TIP Foreign Keys And Primary Keys The foreign key on a child table is dependent on a key on the parent table. In other words, the value stored in a foreign key column on a child tablesuch as the Cust_No column on Ordersmust exist on the parent table, in this case, the Cust_No column on the Customer table. Some databases require that the foreign key relate to the primary key of another table. Others only require that the related key on the parent table be a unique index. Remember that a primary key is itself a unique index. If the related key on the parent table is, itself, a compound key, then the foreign key must also be a compound key. The foreign key must be the same number of columns as the related key on the parent table and each of those columns must be the same data type.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Database Integrity
When designing a database, it is important to remember that the foreign key is always defined as a constraint on the child or dependent table and that table is always part of a one-to-many relationship. I cover the details of this concept under Database Design later in this chapter. However, for now, you should understand that related tables are always related in a one-to-many manner. For example, an examination of Figure 2.4 reveals that for any one customer, there might be many orders. However, for any one order, there can only be one customer. Wherever a one-to-many relationship exists, the table that is the many end of the relationship is always the child table and there should always be a foreign key defined to enforce that relationship. I review how to create foreign keys on the database in Data Definition Language later in this chapter. I described a foreign key as constraining the values that a certain column can contain on a table. This is but one type of constraint that the database can enforce. A foreign key enforces referential integrity. Although databases differ in their capabilities, in general, integrity constraints can be placed in three categories: Referential integrityEnforces relationships between two tables. Domain integrityEnforces values that can occur in any column across an entire table. User integrityEnforces values that can occur in a column based upon values on the same row. Domain integrity and user integrity can be enforced using a variety of constraint types: The primary key, as mentioned earlier, enforces the requirement that a value uniquely identify a row on a table. This is an example of domain integrity.
A unique key requires that any value in a given row be unique within that column. For instance, a Social Security number might be defined as unique. A primary key is implemented as a unique key. A check constraint validates the contents of a column on a row at either the domain or the user level. A row may be defined as Not Null, meaning that the column is not allowed to be null. (Null is the absence of any value at all and is not the same as an empty string. I discuss null more thoroughly in Chapter 3.) You can also employ a check constraint, which validates that a columns values are absolutely restricted. For instance, you can create a check constraint that requires salary to be greater than zero. Because this rule applies to any row on the table, it is also a domain integrity constraint. On the other hand, you might define a check constraint, such as date of hire must be greater than date of birth (to avoid problems with child labor laws, of course), that is dependent on another column on the same row. This is an example of a user integrity constraint. I show the syntax for creating each type of constraint later in this chapter.
Data Types
Relational databases define each column to be a certain data type. Although a number is really just a number regardless of what database is storing it, the RDBMS vendors tend to differ somewhat in what they call those types. For instance, what Oracle calls number Sybase and Microsoft call numeric. Data types tend to fall into four categories: Character data Numeric data Date data Binary data
Character Data
Character data types can hold any alphanumeric values. These values are usually referred to as strings or string literals. Even if the string happens to contain a number, you cannot perform arithmetic operations on it. The two main types of character data types are CHAR and VAR-CHAR. A CHAR data type is a fixed-length field defined as follows: Cust_LName CHAR (21)
If Cust_LName holds the value Jones, it right-pads the field with 16 spaces to fill up the field. Fortunately, most RDBMSs do not require you to account for this when searching for values. Typically, the database automatically trims the longer of two search conditions: SELECT * FROM Customer WHERE Cust_LName = 'Jones'
In this example, the database automatically takes into account that the search term Jones is shorter than Cust_LName. The VARCHAR data type (called VARCHAR2 by Oracle), as its name implies, is a variable-length data type and looks like the following: Cust_LName VARCHAR (21)
TIP CHAR Vs. VARCHAR There is some debate about which is the better data type to use. Certainly, the CHAR data type can waste a lot of space in a database. This was an important decision when disk space cost many dollars per megabyte. However, disk space is cheap now and the database has to work harder to find columns where variable-length data is used. It is difficult to say what the performance is exactly, but I would speculate that the average database takes a 10 percent performance hit if it employs all variable-length character types. My recommendation is to use fixed-length CHAR data types for all but the biggest columns (such as a column to record comments). If you do end up using some VARCHARs, put those columns at the end of each row where the performance penalty is minimized.
In this case, the column holds up to 21 characters, but if the name Jones is stored, it is only 5 characters long.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Numeric Data
Unlike character data, numeric data can take many forms and vary more from RDBMS to RDBMS than other data types. In general, you will run into the following data types (refer to your vendors documentation for any specifics or differences in implementation): NUMERIC (S, P)S stands for scope and P stands for precision. Scope refers to how many digits the number is allowed to hold, and precision refers to how many of those numbers are to the right of the decimal place, as in the following example: Ord_Amount NUMERIC (11,2)
In this example, the largest number that can be stored is 999,999,999.99. Even if your number has no digits to the right of the decimal place, you can use only nine to the left of the decimal point. Thus, a column defined as NUMERIC (4,4) can be no larger than 0.9999. INTEGER is a data type that holds a whole number from 32,768 to +32,767. An UNSIGNED INTEGER is capable of holding positive whole numbers from 0 to 65,536. LONG numbers can be any whole number of approximately negative 2.1 billion to positive 2.1 billion. As with the INTEGER data type, your database may also support an unsigned variation, UNSIGNED LONG, which holds a positive whole number of 0 to approximately 4.2 billion. FLOATING POINT, SINGLE, SINGLE PRECISION, DOUBLE, and DOUBLE PRECISION are all variations on floating-point numbers. A floating-point number can handle either real numbers or irrational numbers (as well as whole numbers, of course). A real number is any number that can be represented in a fixed number of decimal
places, such as 1.25. An irrational number cannot be represented in a fixed number of decimal points. An example is 1/3, which is .333333 (to infinity). The database stores these numbers in scientific format. Single precision numbers (SINGLE or SINGLE PRECISION, depending on your database) are usually 4 bytes long. Double precision numbers are usually 8 bytes long, effectively making them able to handle a much larger range of numbers.
Date Data
Dates are encoded on the database as numbers and are, therefore, limited in their range. Oracles DATE data type, for instance, has a range of 1 January, 4714 BC to 1 January, 4712 AD. The major RDBMSs are generally Year 2000 compliant. However, you should consider that your front-end development language might not support the same range of dates. Visual Basic, for instance, has a range of 1 January, 100 to 31 December, 9999. Inputting dates into the database can sometimes be a chore. The format of the date that the database expects is usually a startup option (that is, it is specified as an optional parameter when starting the database engine on the server). Usually, you can override this setting in an individual session by using the SET command. Oracles default date format is: INSERT INTO
VALUES ('18-Dec-98') Sybase, on the other hand, allows dates to be entered as either 1998-12-18 (December 18, 1998) or 1998/12/18. To alter the default format, you can specify SET OPTION Date_Order DMY, which then causes dates to be interpreted in day, month, and year order.
Binary Data
Generally, the only time you will need to store binary data is when you want to embed objects such as bitmaps or word processing documents into the database. You accomplish such a task with the BLOB (Binary Large Object) data type. Although some RDBMSs are adding object-oriented extensions to automatically handle these objects (refer to A Note About Object-Oriented Databases earlier in this chapter), you generally need to plan in advance how you will display and manipulate this type of data.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Database Design
The key to a solid client/server application is a solid database design. Two approaches have a considerable amount of overlap. Entity-relationship modeling seeks to identify entities, provide a unique identifier for each, determine relationships between those entities, provide a method for enforcing those relationships, and then assign attributes to those entities. This approach provides a logical design of the database where entities map to tables and attributes map to columns. The other approach is database normalization. This approach seeks to take an existing design and validate it against a number of design guideline rules to eliminate processing anomalies. You accomplish this task by reassigning columns to existing or new tables. Although entire books are written on the theory and practice of database design, I provide an overview of the essentials in the next few pages.
Entity-Relationship Modeling
With entity-relationship modeling, you seek to identify those entities that represent a business and eventually map them into a physical database design. Figure 2.5 shows a graphical representation of the process.
Figure 2.5 Summary of the entity-relationship modeling process. Lets assume that we are modeling a simple college. The first thing to do is
identify the business entities. An entity is something physical with a unique identifier. If we were modeling a mail-order company, typical entities would be Customers, Orders, Items, and so on. For our example, we have Students and Courses. Because an entity must be uniquely identifiable, we assign a primary key to each one and draw them on a piece of paper, as shown in Figure 2.6.
Figure 2.6 Step 1 of the modeling process. Next, for each pair of entities, we determine whether there is a relationship between the two. In this case, Students and Courses are indeed related. Therefore, we draw a line to represent that relationship, as shown in Figure 2.7. In our mail-order company, we would have determined that Customers are related to Orders (because customers place orders) and that Orders are related to Items (because items appear on orders) but that Customers are not related to Items. It might be helpful to place some wording on the drawing to describe how the two entities are related.
Figure 2.7 Step 2 of the modeling process. The next step is to determine the nature of the relationship. To do this, you need to ask yourself two questions for each related pair of entities: For each occurrence of the first entity, how many occurrences might there be of the second entity? In addition, for each occurrence of the second entity, how many occurrences might there be of the first entity? Using our example, for each occurrence of Students, how many occurrences of Courses might there be? If there can be only one occurrence of Courses per occurrence of Students, draw a single arrowhead next to the Courses entity. If there can be more than one, draw two arrowheads. If there can be zero occurrences of Courses for each occurrence of Students, also draw a 0 on the line just before the arrowheads. In our example, a student can take zero, one, or more than one course. For any given occurrence of Courses, how many occurrences of Students might there be? For purposes of this exercise, we will assume that it can be one or many but not zero. The result is shown in Figure 2.8. When you have double arrows at each end, the relationship is called a many-to-many relationship. The 0 drawn at the Courses end of the relationship is called a conditional relationship.
Figure 2.8 Step 3 of the modeling process identifies the nature of the relationship between the two entities. You are not permitted to have many-to-many relationships in a referential relationship. Therefore, you need to resolve this problem by creating a logical entity called an assignment entity or a relationship entity. Instead of relating the students and courses to each other, relate them each to the new entity, which we call Student_Courses. Because every entity must have a primary
key, we assign one to the Student_Courses entity. Whenever you create a relationship entity, its primary key is always a combination of the primary keys of the two other entitiesin this case, Stud_ID and Course_ID. A primary key that consists of more than one item is called a compound key. The relationship entity always has a one-to-many relationship with both of the original entities, and the new entity is always at the many end of the relationship (it is, therefore, the child entity). The revised diagram is shown in Figure 2.9.
Figure 2.9 Step 4 of the modeling process resolves any many-to-many relationships by creating relationship entities. Next, you need to assign foreign keys to enforce the relationships between the tables. Remember that a foreign key means that for each occurrence of a child entity, there must be a corresponding value in the parent entity. In entity-relationship modeling, the foreign key of the child table always corresponds to the primary key of the parent table. Also, remember that wherever a one-to-many relationship occurs, there is a foreign key relationship. Figure 2.10 shows two foreign keys in the Student_Courses entity. The Stud_ID that appears in the Student_Courses entity is a foreign key referencing the Students entity.
Figure 2.10 Step 5 of the modeling process assigns foreign keys to enforce referential integrity rules. The last step is to assign attributes, which are things that describe the entities. The attributes always include the primary key and any foreign keys. Sometimes, there is room to assign attributes on the same drawing as the relationship diagram, but typically, it is done on a separate piece of paper using a matrix as shown in Figure 2.11.
Figure 2.11 Step 6 of the modeling process assigns attributes to each of the entities. With the output of the entity-relationship modeling process, you have the information that you need to create a physical database design from the logical one (as represented by the final model). Entities become tables, attributes become columns, and primary and foreign keys are identified. However, you
should first validate your database design by normalizing it. This process is described in the next section.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Database Normalization
There is a lot of overlap between the entity-relationship modeling process discussed in the last section and the database normalization process discussed in this section. It is quite conceivable that you can properly design a database using either approach alone. However, I dont recommend it because each approach has advantages that the other doesnt. The entity-relationship modeling process is particularly well suited for identifying database integrity constraints, although it offers little help to ensure that attributes are assigned to the proper entities. Database normalization seeks to do just thisverify attribute assignment by the application of standardized rules. We do this by comparing our design to a series of normal forms. We say that our database is fully normalized when it is in third normal form. Each normal form seeks to eliminate data redundancy and to insert, update, and delete anomalies. In terms of database operations, an anomaly is an action that either produces an undesirable effect or prevents us from performing a desired action in the database. I discuss these anomalies throughout this section. For purposes of illustration, I use a simplified order-entry system. The order-entry system has the following pieces of information to be stored in the database: ORDERS Ord_No Ord_Date Cust_No Cust_Name Cust_Address Cust_City Cust_State
Cust_Postal_Code Item_No Item_Desc Item_Qty Item_Price Further, I make the assumptions that a customer can place any number of orders, that any given order has only one customer, and that an order can be placed for any number of items. (In other words, a customer can call and purchase a number of different items all on the same order.) However, any one item can appear on the order only once. To begin, we need to assign a primary key to our present design. We will call the primary key Ord_No.
Unfortunately, we cannot stop here because we have some anomalies. First, we have an update anomaly in that we cannot add an item to the database unless there is an order for that item (because the item number and item description are not recorded until there is a row in the ORDER_ITEMS table). Second is an update anomaly: If an items description changes, it must be updated on every single order for that item. Third is a delete anomaly: If an order is deleted, the items description is deleted as well. Therefore, we need to move on to second normal form.
Second normal form, usually abbreviated as SNF or 2NF, states that we can have no partial-key dependencies. A partial-key dependency is a column that is dependent on only part of the primary key. To determine whether we have any second normal form violations, we examine all tables with compound keys (in this case, the ORDER_ITEMS table). An examination of the table shows that Item_Desc is dependent on Item_No in the primary key but not on Ord_No. In other words, an items description changes if the Item_No is different but not if the Ord_No is different. To resolve this problem, we move the partial-key dependencies to another table whose primary key is the partial key from the original table (in this case, Item_No). Our new design looks like the following: ORDERS Ord_No (PK) Ord_Date Cust_No Cust_Name Cust_Address Cust_City Cust_State Cust_Postal_Code ORDER_ITEMS Ord_No (PK) Item_No (PK) Item_Qty Item_Price ITEMS Item_No (PK) Item_Desc
This resolves our insert anomaly from the first normal form because we can now insert a new item into the database even if there are no orders for it. It also resolves our update anomaly because we now have only one row to update if an items description changes. Finally, we have resolved our delete anomaly because deleting an order no longer deletes the associated items. However, we still have anomalies. We have an insert anomaly in that we cant add a customer to the database unless that customer places an order. We have an update anomaly in that if we need to change the customers name or city, we have to update every order that customer has placed. We have a delete anomaly in that if we delete an order, we also delete the customer. To resolve this, we move on to third normal form.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
We are now in third normal form. Normally, you can stop at third normal form and conclude that your database is fully normalized. There are actually other normal forms: Boyce-Code normal form, fourth normal form, and fifth normal form. These normal forms are used for rare circumstances, such as resolving circular references, and nearly always indicate a totally inappropriate data model. In my 15 years of working with relational databases, I have never needed to consider anything beyond third normal form. Sometimes, you need to step back and examine the design with a critical eye toward performance. If you see that you are going to end up having to join too many tables, you
might want to consider selectively denormalizing back to second or first normal form. My recommendation is to build the database in third normal form, load it with some sample data, and then run some benchmark tests against it to see if you have any performance bottlenecks. If you do, examine those bottlenecks and decide if they warrant denormalization.
DDL consists of three statements: CREATE, ALTER, and DROP. For purposes of this discussion, the next three sections deal with using these statements to work with tables. Following that, I discuss other database objects such as indexes and keys.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
CREATE TABLE
The CREATE TABLE statement has the syntax: CREATE TABLE TABLE_NAME (col_name DATATYPE [NOT NULL], col_name ...) Following CREATE TABLE, specify the name of the table. Inside parentheses, list the columns separated by commas. After each column name, list the data type and, optionally, NOT NULL if the column is not allowed to have a null value. The syntax to create the EMPLOYEE table that I use throughout the remainder of this book is as follows: CREATE TABLE EMPLOYEE (Emp_No SMALLINT NOT NULL, Emp_LName VARCHAR (21) NOT NULL, Emp_FName VARCHAR (15), Emp_SSN CHAR (9), Emp_DOB DATE, Emp_Hire_Date DATE NOT NULL, Emp_Term_Date DATE, Emp_Salary NUMERIC (9,2), Emp_Dept_No SMALLINT, Emp_Mgr_ID SMALLINT, Emp_Gender CHAR (1), Emp_Health_Ins CHAR (1), Emp_Dental_Ins CHAR (1), Emp_Comments VARCHAR (255) )
TIP About The Tables Throughout this book, I use a series of tables, including the EMPLOYEE table here. Appendix A contains the syntax to create all of these tables along with primary keys, foreign keys, and so on. Because there are small differences between different databases, the CD-ROM contains these statements in RDBMS-specific directories. For instance, the Oracle CREATE statements are in the directory \SAMPDATA\ORACLE. If your database is not included, it is highly likely that one of the supplied syntax variations will work just fine. Otherwise, the modifications should be simple. Additionally, the CD-ROM includes the INSERT statements needed to create the data.
You can optionally specify various table constraints such as primary keys at the same time you create the table. The following syntax is for Sybase SQL Anywhere, but if you change the line Emp_Salary NUMERIC (9,2) to read Emp_Salary NUMBER (9,2), it will work with Oracle as well: CREATE TABLE EMPLOYEE (Emp_No SMALLINT NOT NULL, Emp_LName VARCHAR (21) NOT NUll, Emp_FName VARCHAR (15), Emp_SSN CHAR (9), Emp_DOB DATE, Emp_Hire_Date DATE NOT NULL, Emp_Term_Date DATE, Emp_Salary NUMERIC (9,2), Emp_Dept_No SMALLINT, Emp_Mgr_ID SMALLINT, Emp_Gender CHAR (1), Emp_Health_Ins CHAR (1), Emp_Dental_Ins CHAR (1), Emp_Comments VARCHAR (255), CONSTRAINT Pk_Emp_Id PRIMARY KEY (Emp_No), CONSTRAINT Fk_Emp_Dept FOREIGN KEY (Emp_Dept_No) REFERENCES DEPARTMENT (Dept_No) ) This syntax presupposes that the table DEPARTMENT already exists. If it doesnt, you need to create that table first. When you create tables and add constraints with the same code, figuring out which table needs to be built first can get complicated. I recommend that you create all your tables first and then add the constraints later with ALTER statements.
ALTER TABLE
The ALTER TABLE command changes a tables structure in some way. Exactly what you can change and when you can change it is somewhat dependent on the exact RDBMS. However, you can generally add a column at any time using the following syntax: ALTER TABLE EMPLOYEE ADD Emp_Maiden CHAR (21) NOT NULL
This example adds a column named Emp_Maiden with a data type of CHAR (21). It is specified NOT NULL. The column is always added to the end of the table. You cannot, unfortunately, delete a column. You can add constraints to a table using syntax similar to the following: ALTER TABLE EMPLOYEE ADD CONSTRAINT Pk_Emp_Id PRIMARY KEY (Emp_No) ALTER TABLE EMPLOYEE ADD CONSTRAINT Fk_Emp_Dept FOREIGN KEY (Emp_Dept_No) REFERENCES DEPARTMENT (Dept_No) This example adds a primary key to the EMPLOYEE table and a foreign key that references the Dept_No column in the DEPARTMENT table. Note that many databases assume the primary key when you dont specify the column name on the parent table. In other words, the following statement causes SQL to assume that the primary key of the DEPARTMENT table is being referenced: ALTER TABLE EMPLOYEE ADD CONSTRAINT Fk_Emp_Dept FOREIGN KEY (Emp_Dept_No) REFERENCES DEPARTMENT Some databases allow you to modify the definition of an existing column as long as the currently stored data in that column does not violate the new column definition. For instance, you can specify a column as NOT NULL even if it was not originally defined that way, as long as there are no null values already stored in the column: ALTER TABLE EMPLOYEE MODIFY Emp_Gender NOT NULL
DROP TABLE
The DROP TABLE command deletes a table and all of its data from the database. Also dropped are all integrity constraints and permissions (I discuss permissions under Data Control Languages later in this chapter). The syntax is: DROP TABLE EMPLOYEE
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Indexes
The database allows you to add indexes on one or more columns of any table. The indexes may be unique, meaning that the index does not permit duplicate, or non-unique, values. You can also use a unique index as a constraint to restrict the values in a column on a table. For instance, you might want to add a unique index on the Social Security number of an employee table. The syntax to create an index is as follows: CREATE [UNIQUE] INDEX INDEX_NAME ON TABLE_NAME (col_name, col_name ) The TABLE_NAME must be any valid table name. The col_name must be a valid column on that table. If more than one is listed, the column names must be comma-delimited. The following examples add a unique index on the Emp_SSN column and a non-unique index on the Emp_LName and Emp_FName columns: CREATE UNIQUE INDEX IDX_SSN ON EMPLOYEE (Emp_SSN) CREATE INDEX IDX_NAME ON EMPLOYEE (Emp_LName, Emp_FName) Most databases allow you to create one clustered index per table. With a clustered index, the rows are arranged in the same order as the index, which speeds up retrieval. Because the syntax for creating clustered indexes varies greatly from database to database, check your RDBMS documentation. You will want to use indexes judiciously. They can speed up database operations tremendously when you often access data in a certain manner. For instance, if you need to list employees in last name, first name order, an index tremendously speeds up retrieval time. On the other hand, indexes need to be maintained. If you update a table on the database, you must update all the indexes on that table as well. Thus, you want to
Integrity Constraints
In the previous discussion of CREATE TABLE and ALTER TABLE, I presented some sample syntax on creating primary key and foreign key constraints. Different RDBMSs have some variance in how to create these constraints, and most allow you to create constraints in a variety of different manners. The syntax that I presented is fairly generic and works on most RDBMSs. For other options, consult your RDBMS documentation. A couple of notes are in order. Consider the following statements, which add three constraints to the EMPLOYEE table: ALTER TABLE EMPLOYEE ADD CONSTRAINT Pk_Emp_Id PRIMARY KEY (Emp_No) ALTER TABLE EMPLOYEE ADD CONSTRAINT Fk_Emp_Dept FOREIGN KEY (Emp_Dept_No) REFERENCES DEPARTMENT (Dept_No) ALTER TABLE EMPLOYEE ADD CONSTRAINT Ch_Gender CHECK (Emp_Gender IN ('M','F') The first example adds a primary key constraint to the table, forcing the Emp_No column to be a unique identifier for the table. The second adds a foreign key constraint, requiring that the value stored in Emp_Dept_No be a valid value in the Dept_No column on the DEPARTMENT table. Because the Emp_Dept_No column was not defined as NOT NULL, the column may also contain null values. (This is an example of a conditional foreign key constraint; it actually says that if the value is not null, it must be a valid value from the parent table.) The third example is an example of a check constraint, which is not supported by all RDBMSs. A check constraint restricts the values of the columns by providing an SQL function. In this case, it specifies that gender must be M or F. The constraint could also reference another column on the same row, as in the following example: ALTER TABLE EMPLOYEE ADD CONSTRAINT Ch_Hire_Date CHECK (Emp_Hire_Date > Emp_DOB) This example requires that hire date be greater than date of birth. You cannot use an SQL select in a check constraint. When a check constraint references another column on the same row, it is called a user integrity constraint. When it provides an absolute check
(such as being only M or F without regard for other columns on the row), it is called a domain integrity constraint. Generally, you do not need to provide a constraint name. If you do not, the database generates one. My recommendation is that you do provide a name so that any messages from the database are meaningful. For instance, if you attempt to set Emp_Gender = X, the database responds with a message similar to Update Failed - Integrity Constraint: CH_GENDER. If you elect not to name the constraint, you should also omit the CONSTRAINT keyword, as in the following example: ALTER TABLE EMPLOYEE ADD CHECK (Emp_Gender IN ('M','F'))
Views
A view is an SQL SELECT permanently stored on the database. A view can be convenient for your users because it helps them avoid typing complicated queries. It can also be used to show a user a portion of the table without revealing the entire underlying table. The following example creates a view on the EMPLOYEE table that allows users to see information such as name but not salary: CREATE VIEW EMPNOSAL AS SELECT Emp_No, Emp_LName, Emp_FName, Emp_Dept_No, Emp_Mgr_ID FROM EMPLOYEE To use the view, select from it as though it were a table: SELECT * FROM EMPNOSAL ORDER BY Emp_LName, Emp_FName A view is not a copy of the datajust a stored query. Also note that most databases allow you to update through a view, as in the following example: UPDATE EMPNOSAL SET Emp_Dept_No = 200 WHERE Emp_No = 100 In the example, the employees department number was updated (I discuss the syntax for the UPDATE command in Chapter 3). Because the view EMPNOSAL does not contain the Emp_Salary column, the user cant update salary information.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
The two commands used are GRANT and REVOKE. For a user to have access to the database, he or she must first be granted connect privileges: GRANT CONNECT TO user_ID IDENTIFIED BY password This syntax creates a new user and password. Assuming the user ID is Jane, to alter the password, use the following syntax: GRANT CONNECT TO Jane IDENTIFIED BY new_password To grant access to individual objects (such as a table), you have to grant a comma-delimited list of permissions to a comma-delimited list of users on one table at a time: GRANT SELECT, UPDATE, DELETE ON EMPLOYEE TO Tom, Dick, AND Harry To grant all permissions, use GRANT ALL PERMISSIONS. To grant permissions to all users, specify PUBLIC instead of a user list: GRANT SELECT ON DEPARTMENT TO PUBLIC You can selectively revoke permissions, but you cannot revoke access from yourself. The syntax to revoke privileges is similar to the following: REVOKE UPDATE, DELETE ON EMPLOYEE FROM Dick, Harry In this example, we have taken away update and delete privileges from users Dick and Harry. Their select privileges remain. Most databases allow you to create groups and assign users to groups. You can then grant permissions to and revoke permissions from the groups. This technique helps simplify administration.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
query result. (I expanded upon this in Chapter 2 when I defined the term RDBMS.)
What Is SQL?
The American National Standards Institute (ANSI) defines various computer industry standards. We are all familiar with the ANSI character set, for instance, which defines each character to be 1-byte long with the decimal value 65 representing the letter A. SQL-92 is an ANSI definition of what features must be supported in a full implementation of SQL. For example, the language needs to support the SELECT statement and the SUM function. If a vendors database product is in full compliance with these standards, it is ANSI-92 compliant. All of the major relational database products, such as Oracle and Microsoft SQL Server, are 100 percent ANSI-92 compliant. (Microsoft Access uses Jet SQL. Although Jet SQL is close, it is not ANSI-92 compliant. I discuss the differences in Appendix B.) Additionally, most vendors add extensions (additional features) implemented as what are commonly called dialects of SQL. Oracle uses PL/SQL, whereas Sybase and Microsoft use Transact-SQL (usually called T-SQL). Microsoft has been moving away from the Sybase model of T-SQL. Mostly, the dialects are close enough in syntax that moving from one to the other is relatively easy. When application programs access a relational database, they create SQL statements and send them to the RDBMS. The RDBMS processes the statements and sends the results back to the application program. Each RDBMS exposes its functionality via an Application Programming Interface (API). To shelter the developer from the intricacies of learning different APIs for each RDBMS, three categories of drivers were developed to interface with Visual Basic: VBSQLA Visual-Basic-specific set of drivers that allow native interface to Microsoft SQL Server. ODBC (Open Database Connectivity)An API that applications can address to further expose the underlying functionality of a wide variety of databases. The many variations on how this is done is the subject of much of this book. Appendix C overviews the implementation of ODBC. OLE DBA new, still evolving, API that exposes the underlying functionality of a wide variety of databases in an object-oriented manner. OLE DB also allows the joining of disparate data sources.
single API, which communicates with ODBC drivers written for each of the different back-end databases. Figure 3.1 illustrates this communication process.
Figure 3.1 Client programs communicate with the ODBC API, which in turn communicates with the individual database-specific drivers. These drivers interact with the back-end databases. In reality, not all ODBC drivers are created equal. There are actually three levels of ODBC compliance: Level 0, sometimes called Minimum ODBC, is a set of minimal functionalities that must be supplied by an ODBC driver, including the ability to perform basic SQL SELECTs and so on. Level 1, sometimes called Core ODBC, is a more thorough implementation of SQL, including statements such as ALTER TABLE as well as more advanced SELECT functionality, such as the MAX and MIN functions. Level 2, or Extended ODBC, is an implementation of SQL that is at least ANSI-92 compliant, including outer joins, cursor-positioned UPDATEs, and so on. The exact definition of each of these compliance levels is discussed in Appendix C. It is not always easy to determine the level of conformance of a given ODBC driver, and performances of the drivers vary from vendor to vendor. Microsoft builds a variety of drivers for both Microsoft and non-Microsoft back-end database products. Intersolv also markets a wide variety of drivers that you can use in your programs. Finally, some vendors create drivers for their own database products; the most notable example is the very well implemented Level 2-compliant Sybase SQL Anywhere database. For technical information about ODBC standards and driver development, refer to Appendix C in this book, as well as to the ODBC 3.0 Programmers Reference And SDK Guide available from Microsoft Press.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
SELECT
The SELECT statement is used to retrieve rows from the database. The basic syntax of the SELECT statement is: SELECT [DISTINCT] column-list| * FROM table-lisT [WHERE condition1 AND condition2] [GROUP BY column-list [HAVING condition1 AND condition2]] [ORDER BY column-list] All clauses except FROM are optional. Use the SELECT clause to list those columns that are to appear in the result set. Each column name must be separated by a comma. To select all columns, you can use the asterisk instead of listing each column separately. Use the FROM clause to list the tables that contain the columns being referenced. For more than one table, separate the table names with commas. The following example selects three columns from the EMPLOYEE table:
SELECT EMP_ID, EMP_FNAME, EMP_LNAME FROM EMPLOYEE The next example selects all the columns from the EMPLOYEE table: SELECT * FROM EMPLOYEE
Derived Columns
SQL allows you to select columns that are not part of the database by creating derived columns. These columns might be string literals, system functions, or the results of operations on table columns.
TIP About The Examples Except where specifically noted, all of the examples in this chapter use the sample data provided on the CD-ROM. See Appendix A for details on loading this information into your database.
The following SELECT creates a derived column that is the result of multiplying EMP_SALARY by 1.08: SELECT EMP_SALARY * 1.08 FROM EMPLOYEE The next example creates a derived column that is actually a string literal: SELECT EMP_FNAME, EMP_LNAME, 'IS AN EMPLOYEE' FROM EMPLOYEE As shown in the following result set, the string literal is returned for every row: EMP_FNAME --------ANN BOB CAROL EMP_LNAME --------CALLAHAN JOHNSON DEMORA 'IS AN EMPLOYEE' ---------------IS AN EMPLOYEE IS AN EMPLOYEE IS AN EMPLOYEE
Some system functions are not truly related to any data at all. For instance, SYSDATE returns the current date and time from the database server. Even still, the SELECT statement must have a FROM clause. Some RDBMSs provide a dummy table from which to select such non-data-related functions: SELECT SYSDATE FROM DUAL This example is from Oracle, which provides a table named DUAL with a single row so that you can select against system-level functions. If you select SYSDATE from the EMPLOYEE table, and if the table has 100 rows, you get a result set of 100 lines of the current date and time.
If your database does not provide a dummy table, you can easily create one: CREATE TABLE DUMMY (DUM_COL CHAR(10) ) INSERT INTO DUMMY VALUES ('DUMMY') SELECT SYSDATE FROM DUMMY When the result set is displayed, the column headings default to the column names. You can rename a column heading using the AS keyword: SELECT EMP_NO AS 'EMPLOYEE NUMBER' FROM EMPLOYEE Some databases allow you to omit the AS keyword. If the new column name contains a space, you must surround the column name with double quotes. Some databases require that you use double quotes all the time (regardless of whether there is a space in the name).
$75,000 (regardless of gender): WHERE (EMP_GENDER = 'F' AND EMP_SALARY > 50000) OR EMP_SALARY > 75000 You may use a scalar function in a WHERE clause but not an aggregate function. I discuss SQL functions later in this chapter, but for now, understand that a scalar function is one that operates on a single value at a time (such as returning the length of a last name), whereas an aggregate function operates on a range of values (such as returning the sum of all salaries). The following example returns rows where the length of the last name is greater than five: WHERE LENGTH(EMP_LNAME) > 5 A column used in the WHERE clause does not have to appear in the SELECT statement, as shown in the following example: SELECT EMP_ID, EMP_FNAME, EMP_LNAME FROM EMPLOYEE WHERE EMP_GENDER = 'F' You can perform wild-card searches using the LIKE keyword. To do so, use the underscore (_) character to represent a single character and the percent sign (%) to represent any number of characters. The following statement searches for all last names beginning with the letter B: WHERE EMP_LNAME LIKE 'B%'
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
The next example locates all those rows where the last names have the letter B in the second position and the letter D in the fourth position: WHERE EMP_LNAME LIKE '_B_D%' The WHERE clause also supports the use of the BETWEEN and IN modifiers. With BETWEEN, you can specify an inclusive range of values as in the following example, which returns rows where the salary is from $50,000 to $60,000: WHERE EMP_SALARY BETWEEN 50000 AND 60000 The IN keyword allows you to search a list of discrete values. You separate each value with commas and place the entire list inside parentheses. If the values are strings, they must be in single quotes: WHERE EMP_LNAME IN ('BROWN', 'SMITH', 'JOHNSON')
TIP About Databases And Case Sensitivity When dealing with character data (strings), the database is case sensitive just as Visual Basic is case sensitive when comparing two strings. Later in this chapter, I will introduce case conversion functions to help deal with this problem. Though the SQL code samples in this book have all keywords in uppercase, databases are generally not case sensitive with column names unless you surround the column names with double quotes.
Testing for the existence of null values requires special functions. Because any operation involving null is automatically null (I discuss this fully later in this chapter), SQL provides the IS NULL and IS NOT NULL functions. The following statement locates rows where YEARS is null:
Joining Tables
When you select data from more than one table, you need to tell SQL how the tables are related. You do this with the WHERE clause. Consider the DEPARTMENT and EMPLOYEE tables (see Appendix A for how the tables are defined). If I want to produce a list of departments and all the employees in each of those departments, I need to relate the EMPLOYEE table to the DEPARTMENT table. Each row on the EMPLOYEE table has an EMP_DEPT_NO column, and the DEPARTMENT table has a DEPT_NO column. Therefore, given a row in the DEPARTMENT table, I want to list all rows in the EMPLOYEE table where DEPT_NO is equal to EMP_DEPT_NO. This forms a join condition: SELECT DEPT_NO, DEPT_NAME, EMP_NO, EMP_FNAME, EMP_LNAME FROM EMPLOYEE, DEPARTMENT WHERE DEPT_NO = EMP_DEPT_NO Although I did not do so, many developers use common column names between tables. Whereas I used EMP_DEPT_NO in the EMPLOYEE table to store the department number, others might have simply used DEPT_NO. Obviously then, the two tables both have a column with the same name (DEPT_NO). There is nothing wrong with this at all. However, if an SQL query references a column name that appears in more than one table in the query, you have to qualify the column name with the table name to avoid ambiguity: SELECT DEPARTMENT.DEPT_NO, DEPT_NAME, EMP_NO, EMP_FNAME, EMP_LNAME FROM EMPLOYEE, DEPARTMENT WHERE EMPLOYEE.DEPT_NO = DEPARTMENT.DEPT_NO This example shows how to qualify the column names where ambiguity exists. The example assumes, of course, that the two column names are identical.
Table Aliasing
Typing in all those table names can be tiring, especially when qualifying column names. SQL allows you to alias table names in the FROM clause: FROM EMPLOYEE EMP, DEPARTMENT DEP Once you have aliased a column, you use the alias instead of the table name elsewhere in the query: SELECT EMP.EMP_NO, DEP.DEPT_NO Aliasing also has implications in correlated subqueries and self-joins, both of which I discuss later in this chapter. Aliased names are often referred to as correlation names, particularly when dealing with subqueries.
You can join together more than two tables. If in the previous example I want to add the location name (from the LOCATION table), I have to join that table to one of the other two tables. The DEPARTMENT table has a column DEPT_LOC_ID that corresponds to the LOC_ID column in the LOCATION table. Therefore, my query might look like this: SELECT DEPT_NO, DEPT_NAME, LOC_NAME, EMP_NO, EMP_FNAME, EMP_LNAME FROM EMPLOYEE, DEPARTMENT, LOCATION WHERE DEPT_NO = EMP_DEPT_NO AND DEPT_LOC_ID = LOC_ID Sample output based on the above query is shown here: DEPT_NO ------100 100 100 100 DEPT_NAME --------ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING LOC_NAME -------NORTHEAST NORTHEAST NORTHEAST NORTHEAST EMP_NO -----133 101 105 113 EMP_FNAME --------MAUREEN ANN EUNICE MICHAEL EMP_LNAME --------PODANSKI CALLAHAN BROWN ANDERSON
Self-Joins
A confusing concept for many developers is that you can join a table to itself. Consider the EMPLOYEE table: One of the columns is EMP_MGR_ID, which represents the manager of the employee. If you select the contents of the EMPLOYEE table, you will see that employee number 126s (David Madison) manager is employee 110 (John Smith). Assume I want a list of all employees and the names of their managers. In Figure 3.2, you can see where I need to join a column from one row (EMP_MGR_ID) with a column in another row (EMP_NO). Essentially, I need to pretend that there are two copies of the EMPLOYEE table.
Figure 3.2 To find the name of David Madisons manager, I need to reference the EMP_MGR_ID column on Davids row and use it to find John Smiths row. The query that I put together uses two aliases for the EMPLOYEE table. I use the first alias (EMP) to find the employee information and the second (MGR) to find the manager information: SELECT EMP.EMP_NO AS 'EMP NO', EMP.EMP_FNAME AS 'EMP FIRST', EMP.EMP_LNAME AS 'EMP LAST', MGR.EMP_NO AS 'MGR ID', MGR.EMP_FNAME AS 'MGR FIRST', MGR.EMP_LNAME AS 'MGR LAST' FROM EMPLOYEE EMP, EMPLOYEE MGR WHERE EMP.EMP_MGR_ID = MGR.EMP_NO
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
The join is on the EMP_NO column from the MGR copy of the EMPLOYEE table to the EMP_MGR_ID column of the EMP copy of the table. Sample output based on the query is presented here. Note that Ann Callahan has no manager: EMP NO -----111 113 106 105 101 EMP FIRST --------KATHRYN MICHAEL FRANK EUNICE ANN EMP LAST -------AMES ANDERSON BENSON BROWN CALLAHAN MGR ID -----119 101 110 101 101 MGR FIRST --------URSULA ANN JOHN ANN ANN MGR LAST -------SMITHSONIAN CALLAHAN SMITH CALLAHAN CALLAHAN
Self-joins are not terribly difficult once you get used to the idea of joining a table to itself; they can often be useful to find information in one row that is related to information in another row in the same table.
EMP_DEPT -------100 100 100 100 100 100 100 100 200 200 200
EMP_FNAME --------ANN MICHAEL ROSE EUNICE CHARLES WINIFRED MAUREEN GEORGE DAVID HENRIETTA BRIAN
EMP_LNAME --------CALLAHAN ANDERSON DANIELS BROWN GERRFON VANCE PODANSKI DE NORVA MADISON KEOUGH JOHNSON
EMP_SALARY ---------127500.00 72563.34 67572.42 61426.08 53496.87 37712.10 34591.90 21313.97 77492.47 72789.20 72750.62
The columns that you order by do not have to appear in the SELECT list, although it makes no sense not to include them. (How would the reader know that data was sorted by department if you did not display the department?) You can order positionally by specifying the column number instead of the column name, as shown in this example: ORDER BY 1, 4 DESC However, it is likely that the next ANSI SQL standard will remove this option as part of the SQL definition. You may, therefore, be better off ordering by the column names themselves. If you have renamed the columns (see the discussion of SELECT earlier in this chapter), you can order by the new column name as shown: SELECT EMP_FNAME AS 'FIRST', EMP_LNAME AS 'LAST' FROM EMPLOYEE ORDER BY 'LAST', 'FIRST' Under certain circumstances, you must order by the new column names. An example is performing a UNION (which I discuss next).
SELECT EMP_NO AS 'NUMBER', EMP_LNAME AS 'NAME' FROM EMPLOYEE UNION SELECT DEPT_NO, DEPT_NAME FROM DEPARTMENT ORDER BY 2, 1
SQL Functions
SQL provides a number of functions to manipulate information from the database. Functions fall into two categories: Scalar functions act on one item at a time and aggregate functions act on a range of values at a time. Deciding whether a function is scalar or aggregate is more than an academic exercise. Consider the WHERE and HAVING clauses (I discuss HAVING later in this chapter): WHERE clauseYou can use a scalar function as a search condition in a WHERE clause. For instance, WHERE SQRT (EMP_SALARY) > 5000 (SQRT returns the square root of an expression) is valid, but WHERE SUM (EMP_SALARY) > 250000 is not. HAVING clauseYou can use aggregate functions in a HAVING clause to restrict a result set. For instance, HAVING SUM (SALARY) > 250000 is valid, but HAVING SQRT (EMP_SALARY) > 5000 is not. Whether a function is scalar or aggregate also has an impact on handling of null values.
Null Handling
A value of NULL stored in a database creates special problems. Consider the following: 100 + NULL = ? Nonintuitively, the answer to this equation is NULL. Any operation performed on NULL is automatically NULL. Assume that you want to give every employee a $100 a week raise, and you want to see what the new weekly salary for each employee would be. You might code the following query: SELECT EMP_FNAME, EMP_LNAME, EMP_SALARY/52 + 100 FROM EMPLOYEE The SELECT works fine except for those employees whose salary happens to be NULL. The result for those employees is still NULL. Different databases handle this problem with various functions. With Oracle, you can use the NVL function: SELECT EMP_FNAME, EMP_LNAME, NVL(EMP_SALARY, 0)/52 + 100 FROM EMPLOYEE In this case, the NVL function specifies that if EMP_SALARY is NULL, then the value 0 should be substituted. Some other RDBMSs implement the similar function ISNULL:
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
In this case, the ISNULL function says that if EMP_SALARY is equal to NULL, the value 0 should be substituted. If it is not equal to NULL, the EMP_SALARY value should be used. (The ISNULL function is similar to the IF function in most spreadsheets.)
DISTINCT
Arguably, DISTINCT is not as much a function as it is a SELECT statement qualifier. However, because it operates on a single value, I include it here. Its purpose is to return only unique values for a row: SELECT DISTINCT (EMP_DEPT_NO) FROM EMPLOYEE The result is: EMP_DEPT_NO ----------100
200 300 400 Most databases do not require that the column name be surrounded by parentheses when using DISTINCT (however, it does not hurt to use them).
COUNT
COUNT returns a count of all rows that meet a condition. For instance, COUNT (EMP_DEPT_NO) returns 32 because there are 32 values stored in that column. However, COUNT (DISTINCT (EMP_DEPT_NO)) returns 4 because there are only 4 unique values. Often, COUNT (*) is used to find the number of rows on a table: SELECT COUNT(*) FROM EMPLOYEE Also, you can use COUNT(*) in conjunction with the GROUP BY clause to determine counts in categories (I discuss GROUP BY later in this chapter): SELECT EMP_GENDER AS 'GENDER', COUNT(*) AS 'COUNT' FROM EMPLOYEE GROUP BY EMP_GENDER The following result set is returned: GENDER -----F M COUNT ----18 14
SUM
SUM adds values in a column. To obtain a total of the salaries from the EMPLOYEE table, run the following query: SELECT SUM(EMP_SALARY) FROM EMPLOYEE The following is the result: SUM(EMP_SALARY) --------------1861241.06
FROM EMPLOYEE The result is: MIN(EMP_SALARY) --------------18331.50 MAX(EMP_SALARY) --------------127500 AVG(EMP_SALARY) --------------58163.78
LENGTH
As with the Visual Basic LEN function, LENGTH returns the length of a column or expression. LENGTH (Smith) returns 5.
SOUNDEX
SOUNDEX searches for strings based upon their similarity (in sound) to the supplied argument. Although this should be a powerful function, its usefulness can be iffy. RDBMSs tend to have different levels of tolerance for what two words sound alike and may need some tuning. The following query and result comes from Sybase SQL Anywhere. Refer to your RDBMSs documentation regarding the use of SOUNDEX. SELECT EMP_NO, EMP_FNAME, EMP_LNAME FROM EMPLOYEE WHERE SOUNDEX('SMYTHE') = SOUNDEX(EMP_LNAME) The result of the query is: EMP_NO -----110 118 103 EMP_FNAME --------JOHN STEVEN CAROL EMP_LNAME --------SMITH SMITH SMITH
leftmost n characters from a column, whereas RIGHT returns the rightmost n characters from a column: SELECT LEFT (EMP_LNAME, 1), RIGHT (EMP_LNAME, 4) Not all databases support the RIGHT and LEFT functions, but if not, the SUBSTR function is adequate.
Concatenation
Some databases provide specific functions to concatenate two strings. For instance, Oracle offers the CONCAT function, which accepts two arguments (two strings to be concatenated). Almost all RDBMSs support the concatenation operator, which is simply two pipe characters (the pipe character is usually above the backslash on your keyboard and looks like one vertical dash on top of another). The advantage of the concatenation operator is that you can concatenate as many strings as needed without resorting to nesting functions. The first example in the next function shows the concatenation operator in use.
GENEVIEVE
WESLEY
The LTRIM function removes leading spaces from a string if it has any. Some RDBMSs also provide a TRIM function, which combines the functionality of LTRIM and RTRIM.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
MOD Or REMAINDER
The MOD function returns the remainder after dividing two numbers. MOD
(15, 2) returns 1 because 15 divided by 2 equals 7 with a remainder of 1. This can be a useful function when performing a lot of math at the database server. Some RDBMSs call this function REMAINDER, but the purpose and syntax is otherwise identical.
POWER
Similar to the Visual Basic exponentiation operator, POWER raises a number to the power of another number: POWER (2, 4) returns 16 (2 raised to the 4th power). Not all RDBMSs support this function. You can use fractional exponents to return the root of a number. POWER (8, 3) returns two, which is the third root of eight.
Date Functions
The different RDBMSs have a wide range in what functions they provide to support operations on dates. For instance, Oracle has a DAYS_AFTER function, which returns the date that falls n days after the supplied date: SELECT DAYS_AFTER ('02-FEB-99', 365) This function returns '02-Feb-00'. Sybase uses DAYS: SELECT DAYS ('02-FEB-99', 365) Check your RDBMS documentation for specific date functions.
GROUP BY
The SQL GROUP BY clause, which allows you to group data, is particularly useful when taking breaks on aggregate functions (such as showing subtotals). When a SELECT has a column with an aggregate function, you must GROUP BY all columns where there is no aggregate function. The following query performs a sum on EMP_SALARY and also selects EMP_DEPT_NO and EMP_GENDER. Therefore, you must group by the latter two columns, as shown in this example: SELECT EMP_DEPT_NO, EMP_GENDER, SUM(EMP_SALARY) FROM EMPLOYEE GROUP BY EMP_DEPT_NO, EMP_GENDER ORDER BY EMP_DEPT_NO, EMP_GENDER It sometimes confuses developers that they need to GROUP BY all non-aggregate function columns, but the example shows it only makes sense that if you are summing salary information, you need to take breaks (show subtotals) on all the other columns. The result follows: EMP_DEPT_NO ----------100 100 EMP_GENDER ---------F M SUM(EMP_SALARY) --------------328802.50 147374.18
F M F M F M
You can use multiple aggregate functions as shown in the following example, which compares minimum, maximum, and average salaries for males and females in each department and shows a count for each: SELECT EMP_DEPT_NO AS 'DEPT', EMP_GENDER AS 'SEX', COUNT(*) AS 'COUNT', MIN(EMP_SALARY) AS 'MIN SALARY', MAX(EMP_SALARY) AS 'MAX SALARY', ROUND(AVG(EMP_SALARY),2) AS 'AVG SALARY' FROM EMPLOYEE GROUP BY EMP_DEPT_NO, EMP_GENDER ORDER BY EMP_DEPT_NO, EMP_GENDER The result of the query follows: DEPT ---100 100 200 200 300 300 400 400 SEX --F M F M F M F M COUNT ----5 3 3 6 8 1 2 4 MIN SALARY ---------34591.90 21313.97 27487.44 33545.31 18331.50 77568.70 33543.13 36201.78 MAX SALARY ---------127500.00 72563.34 72789.20 77492.47 104800.00 77568.70 42216.41 73643.82 AVG SALARY ---------65760.50 49124.73 49038.55 51904.22 65632.07 77568.70 37879.77 62034.65
As shown in the result, females in this company tend to make less than their male counterparts in similar jobs except in Department 100. (That was entirely unintentional. All data in the supplied sample tables was generated randomly.) You can also GROUP BY columns that do not appear in the SELECT list. This option mostly has applications in subselects, which I discuss later in this chapter. Sometimes, however, this option is useful even without subselects but in conjunction with the HAVING clause, which I discuss next.
HAVING
The last clause in the SQL SELECT statement is the HAVING clause. The HAVING clause allows you to restrict groups of data based on the results of aggregate functions and thus is usually associated with the GROUP BY clause. You will most likely read in some SQL texts that HAVING must be used in conjunction with the GROUP BY clause, but that is not actually true. The following statement, although not very meaningful, executes just fine:
SELECT SUM(EMP_SALARY) FROM EMPLOYEE HAVING SUM(EMP_SALARY) > 1 The example says to select the sum of all salaries if that sum is greater than one. When used in conjunction with GROUP BY, the HAVING clause is more meaningful. I refine the earlier example of salaries by department and gender to show only those departments and gender combinations where the average salary is less than $50,000: SELECT EMP_DEPT_NO AS 'DEPT', EMP_GENDER AS 'SEX', COUNT(*) AS 'COUNT', MIN(EMP_SALARY) AS 'MIN SALARY', MAX(EMP_SALARY) AS 'MAX SALARY', ROUND(AVG(EMP_SALARY),2) AS 'AVG SALARY' FROM EMPLOYEE GROUP BY EMP_DEPT_NO, EMP_GENDER HAVING AVG(EMP_SALARY) < 50000 ORDER BY EMP_DEPT_NO, EMP_GENDER The new report follows: DEPT ---100 200 400 SEX --M F F COUNT ----3 3 2 MIN SALARY ---------21313.97 27487.44 33543.13 MAX SALARY ---------72563.34 72789.20 42216.41 AVG SALARY ---------49124.73 49038.55 37879.77
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Sometimes it is useful to combine the GROUP BY and HAVING clauses even where the aggregate function does not appear in the SELECT list: SELECT EMP_DEPT_NO FROM EMPLOYEE GROUP BY EMP_DEPT_NO HAVING MIN(EMP_SALARY) < 25000 This query returns two rows (Departments 100 and 300) where the minimum salary is less than $25,000. You can also use HAVING in conjunction with a GROUP BY column that does not appear in the SELECT list, as you will see in the next section.
Subqueries
A subquery can be considered a way to join the result of two queries. Suppose you want to find all employees who work in a department whose average salary is greater than $55,000. You really need to first know what departments have an average salary greater than $55,000: SELECT EMP_DEPT_NO, AVG(EMP_SALARY) FROM EMPLOYEE GROUP BY EMP_DEPT_NO As shown in the following result, you need to list those employees in Departments 100 and 300: EMP_DEPT_NO ----------100 200 EMP_SALARY ---------59522.08 50948.99
300 400
66958.36 53983.02
You can now run a second query: SELECT EMP_NO, EMP_DEPT_NO, EMP_FNAME, EMP_LNAME, EMP_SALARY FROM EMPLOYEE WHERE EMP_DEPT_NO IN (100, 300) The trick, however, is to ask both questions at once. To do this, you need a subquery: SELECT EMP_NO, EMP_DEPT_NO, EMP_FNAME, EMP_LNAME, EMP_SALARY FROM EMPLOYEE WHERE EMP_DEPT_NO IN (SELECT EMP_DEPT_NO FROM EMPLOYEE GROUP BY EMP_DEPT_NO HAVING AVG(EMP_SALARY) > 55000) The subquery returns a list of those departments where the departments average salary is greater than 55,000 and produces the following result set: EMP_NO -----101 105 113 117 119 121 123 125 127 129 131 133 135 103 107 111 115 EMP_DEPT_NO ----------100 100 100 100 300 100 300 100 300 100 300 100 300 300 300 300 300 EMP_FNAME --------ANN EUNICE MICHAEL ROSE URSULA WINIFRED ALAN CHARLES EUGENE GEORGE JOAN MAUREEN PATRICIA CAROL GENEVIEVE KATHRYN OLIVERA EMP_LNAME --------CALLAHAN BROWN ANDERSON DANIELS SMITHSONIAN VANCE LAWRENCE GERRFON FITZ DE NORVA WILLIAMS PODANSKI NORTON SMITH WESLEY AMES MAHARAMBA EMP_SALARY ---------127500.00 61426.08 72563.34 67572.42 104800.00 37712.10 77568.70 53496.87 58120.56 21313.97 74586.61 34591.90 61597.15 74748.78 18331.50 97600.00 35271.95
If I want to know which employee in each department has the highest salary, I need to use what is called a correlated subquery. A correlated subquery is one where a search condition in the inner query, specified in the WHERE clause, is correlated to the WHERE clause of the outer query.
ORDER BY EMP_DEPT_NO The output of the query looks like the following: EMP_DEPT_NO ----------100 200 300 400 EMP_FNAME --------ANN DAVID URSULA PAUL EMP_LNAME --------CALLAHAN MADISON SMITHSONIAN JAMESSON EMP_SALARY ---------127500.00 77492.47 104800.00 73643.82
The WHERE clause in the outer query creates a correlated variable, EMP, allowing the match of EMP_DEPT_NO from the outer query to the same column from the inner query.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
You use the EXISTS keyword in subqueries to test for the existence of occurrences in an outer query. Suppose you want to find all departments that have one or more employees making less than $25,000 and show the number of those employees. You can run a query and test whether the results of the inner query exist in the outer query. The following example selects EMP_DEPT_NO from EMPLOYEE and then restricts the results to those rows where the department number is also returned by the inner query: SELECT EMP_DEPT_NO FROM EMPLOYEE EMP WHERE EXISTS (SELECT EMP_DEPT_NO FROM EMPLOYEE WHERE EMP.EMP_NO = EMP_NO AND EMP_SALARY < 30000 ) ORDER BY EMP_DEPT_NO The result is: EMP_DEPT_NO ----------100 200 300 You have a variety of other ways to join the results of an inner query to an outer query, and you can nest the queries as deeply as necessary. Two keywords that you will find in most databases are ANY and ALL. The ANY keyword allows you to compare a column from the outer query to any of the values returned by the inner query. Assuming you want to know all the male employees who have a female manager, you can code the query as follows:
SELECT EMP_NO, EMP_FNAME, EMP_LNAME FROM EMPLOYEE WHERE EMP_GENDER = 'M' AND EMP_MGR_ID = ANY (SELECT EMP_NO FROM EMPLOYEE WHERE EMP_GENDER = 'F') ORDER BY EMP_NO The results are: EMP_NO -----112 113 116 120 123 125 129 132 EMP_FNAME --------LARRY MICHAEL PAUL VINCENT ALAN CHARLES GEORGE KEVIN EMP_LNAME --------JOHANSSEN ANDERSON JAMESSON LINCOLN LAWRENCE GERRFON DE NORVA KERRIGAN
TIP A Few Words About COMMIT And ROLLBACK Chapter 11 delves into the concepts of transactions, sometimes called logical units of work (LUW). Every transaction ends with either a COMMIT or a ROLLBACK. COMMIT makes permanent all changes to the database in the current transaction. ROLLBACK puts the database back to where it was. In other words, it undoes all the changes made during the current transaction. In the following sections, I show you examples of commands that alter data. So that your results remain the same as mine, I highly urge you to create a backup of the database so that you can restore it to its original condition. Alternatively, you can issue a ROLLBACK command after each INSERT, DELETE, or UPDATE to undo all of the changes.
INSERT
It comes as no surprise that the INSERT statement is used to add new records to a table. The syntax of the command is: INSERT INTO tablename [(column-list)] VALUES (value1, value2, valuen) The tablename is an existing table on which the developer has insert authority. The column-list lists those columns into which data is to be inserted. If all columns are used, the column list is not necessary. The following statement adds a new row to the DEPARTMENT table. Because all columns are used, the column list is omitted: INSERT INTO DEPARTMENT
VALUES (500, 'NE', 'New Department') Notice that all character values must be enclosed in single quotes. If you use double quotes, SQL assumes you are referencing another column. If you are inserting a row where a column will be set to NULL, you must specify the NULL keyword: INSERT INTO DEPARTMENT VALUES (500, 'NE', NULL)
DELETE
The DELETE statement removes one or more rows from a table. The syntax is: DELETE FROM tablename [WHERE condition] The tablename is any valid table. If you omit the WHERE clause, all rows are deleted. The following statement deletes all employees whose department is 100: DELETE FROM EMPLOYEE WHERE EMP_DEPT_NO = 100
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
UPDATE
The UPDATE command updates one or more rows in a single table and takes the following syntax: UPDATE tablename SET column_name= value [, column_name= value] [WHERE condition] You can update any one or more columns, although most RDBMSs wont allow you to update the primary key columns. If you leave out the WHERE clause, all rows are updated. In the following code snippets, the first example gives all employees an 8 percent raise. The second command then gives all female employees in departments 200, 300, and 400 an additional 10 percent raise (obviously to make up for the inequities noted in the GROUP BY discussion earlier in this chapter): UPDATE EMPLOYEE SET EMP_SALARY = EMP_SALARY * 1.08 UPDATE EMPLOYEE SET EMP_SALARY = EMP_SALARY * 1.1 WHERE EMP_GENDER = 'F' AND EMP_DEPT_NO IN (200, 300, 400)
really only scratched the surface of what can be done with the SELECT statement. A point of confusion for any developer is where to perform processing. For instance, you can do a simple SELECT from the database and then sort the result set at the client using Visual Basic. Alternatively, you can use the ORDER BY clause to do the sorting at the database level. Although there is no hard and fast rule, common sense usually dictates the correct answer. In general, it is better to let the database perform the work unless doing so would generate more network traffic. The most obvious example of that is letting the database format numbers (adding thousand and decimal separators and currency symbols). That is probably best left to Visual Basic to add when it displays the data. I discuss these issues in future chapters. If you feel comfortable with SQL, particularly with the SELECT statement, then read on. If not, you might want to consult one of the many books on the subject. If you will be developing for more than one RDBMS, you might want to consider books that are not RDBMS-specific. Otherwise, you probably want to find a book that is specific to your database. If you are developing and Microsoft Jet will be involved, you might want to consult Appendix B, where I discuss some of the vagaries of that dialect of SQL and how it differs from ANSI SQL. If you are using the ODBC API, you will also want to refer to Appendix C. If you havent yet read Chapter 2, you might want to do so. I discuss the other two portions of SQL (DCL and DDL) there. For advanced topics, sneak a peek at Chapter 11. Otherwise, read the next chapter (Chapter 4) where I overview data access with Visual Basic. Finally, dont forget that the Internet provides a wealth of information. Specifically, each of the major database vendors has its own Web site, and a search of the knowledge base at Microsofts Web site using the terms database and SQL yields many tricks and tips.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
the ODBC API as a means of database development. During VB5s development, Microsoft discussed the new Active Data Object model. However, the technology was not ready by the time VB5 was shipping (and, in fact, it will be evolving for some time to come), and most VB developers therefore really did not understand that using the ADO model was another option to development. When the purchaser of Visual Basic 6 opens the box, he or she will again see something that looks basically familiar. However, the client/server developer will quickly lose that illusion when poking around under the covers. Wizards that once generated DAO- or RDO-based forms dont even offer those data models as an option. The days of ADO have arrived, and ADO 1.5 is a welcome, robust, and scalable data model upon which to develop. The ADO Data control, although not backwards compatible with the RDC or the DAO Data control, is similar enough that most VB developers will have little problem adjusting. Best of all, ADO combines the best of DAO (access to ISAM databases) and RDO (efficient access to ODBC data sources) with a high-performance, low-overhead OLE DB engine. Although this book discusses all of the data models, the unmistakable trend is toward Active Data Objects. In the pages that follow, I discuss each of these data models with a brief overview and some examples. In the chapters that follow this one, each model is examined more in depth. For some developers, it will be worth the effort to convert existing projects to ADO, but for others, there will not be enough payback at this point in time to do so.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
DAO/Jet* DAO/ODBCDirect Data controls DAO/ODBCDirect* RDO Data controls RDO* VBSQL ODBC API ADO Data controls ADO*
*Writing
4 5 6 6 7 9 10 9 10
7 8 7 9 8 7 4 10 9
5 8 5 8 5 5 1 8 5
6 9 5 9 6 4 1 9 7
Table 4.1 lists 11 separate methods of data access that all overlap at some point or other. Table 4.2 shows which types of databases each method can access. Flat file I/O (using VBs Open and Get keywords) is, of course, the roll-your-own approach and offers little to no realistic likelihood of accessing a true, relational database. Under some circumstances, flat file I/O may be appropriate for ISAM files. DAO/Jet (with or without the Data control) can process some sequential file formats but only to the extent that they are structured. DAO/Jet can also access most major relational databases. DAO/ODBCDirect (with or without the Data control) can access almost any relational database for which an ODBC driver is supplied and can access many ISAM formats as long as an ODBC driver has been defined. RDO is strictly bound to ODBC data sources, so it can connect to virtually all relational databases and many ISAM files. VBSQL is used with Microsoft SQL Server using DB-Lib. Older versions of Sybase SQL Server also support the DB-Lib interface and should, in theory at least, be accessible via VBSQL. The ODBC API method, like RDO, can access any ODBC data source. ADO is meant to cover the entire spectrum of data sources, including those that may be defined at some point in the future. Table 4.2 The types of databases and which data access methods can access each. Method Sequential ISAM Relational Flat File I/O DAO/Jet Data controls DAO/Jet* DAO/ODBCDirect Data controls DAO/ODBCDirect* RDO Data controls RDO* VBSQL Yes Limited Limited No No No No No Limited Yes Yes Limited Limited Limited Limited No No Yes Yes Yes Yes Yes Yes DB-Lib only
No Yes Yes
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
Keyword
Brief Full Advanced Search Search Tips
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
The five open_mode options are Append, Output, Input, Random, and Binary. The first three options are used where a file is to be accessed sequentially (as opposed to randomly). In Append mode, any data that you write is added to the end of the file. Conversely, in Output mode, any data overwrites the file from the beginning. I discuss Random and Binary modes in a moment. The default is Random. The access_mode determines whether the file can be read to or written from. The options are Read, Write, and ReadWrite. The default is ReadWrite. The lock_mode option specifies how the file locking is handled in a multiuser environment. The options are Shared, Lock Read, Lock Write, and Lock Read Write. If you open a file in Shared mode, any other process has full access to the file. In Lock Read mode, the file is locked against others reading it. Likewise, in Lock Write mode, other processes may not write to the file. The most restrictive (and default) mode is Lock Read Write, which prevents any other process from opening the file while your program has it open. If you have opened a file as Lock Write and another program attempts to open it in Output mode (for example), a runtime error in that program will be generated. You must assign an unused file_number when opening a file. All subsequent operations against the file reference the assigned file number. The number must be in the range of 1 to 511. Use numbers of 255 and below if the file will not be used by other processes. Use numbers above 255 if the file can be opened by other processes. Use the FreeFile function to return the next available file number: ' Return a file number from 1 to 255 Dim iFile1 As Integer iFile1 = FreeFile ' Return a file number from 256 to 511 Dim iFile2 As Integer iFile2 = FreeFile 1 The pound sign (#) before the file number is optional, and it is retained by Visual Basic for backward compatibility with Basic versions that did require it. The record_length argument is used with files opened in Random mode and is ignored if the file is opened in Binary mode. With files opened randomly, all records in the file are normally a fixed length. This provides the ability to locate a record quickly by simply supplying a record number. For example, if all records in a file are 100 bytes long, the operating system can use that to locate record number 4 by moving the file pointer to byte 301 in the file (assuming the first record begins at byte 1, the second record at byte 101, and so on). Of course, your program is responsible for maintaining the logic necessary to know where in the file a given record is. When a file is opened in Binary mode, you can move around the file by specifying an absolute byte offset from the beginning of the file or from the current location in the file. Once you have finished using a file, use the Close statement followed by the file number to close it. If you do not supply a file number, all currently opened files are closed. When a file is closed, the file number is disassociated from that file and can be used again for another file. You should be sure to close all open files, especially those opened in Output or Append modes, because that forces the changes to be written to disk. Failure to properly close the file can result in your changes being lost.
Listing 4.1 opens three files (in Append, Input, and Random modes). Notice that a user type EmpRec is created and used as an argument to the Len clause on the Random file; the file will be used to store employee records. Listing 4.1 Demonstration of several flat file access techniques. ' Variables to store file numbers Dim iFile1 As Integer Dim iFile2 As Integer Dim iFile3 As Integer ' Open a file in Append mode iFile1 = FreeFile Open "c:\program.log" For Append Lock Read Write As iFile1 ' Open a file in Input mode, read only iFile2 = FreeFile Open "c:\autoexec.log" For Input Read As iFile2 ' Create a user-defined data type for the next file Type EmpRec Emp_No As Integer Emp_Name As String * 30 Salary As Currency End Type ' Create a variable of type EmpRec Dim strEmp As EmpRec ' Open a file randomly using a record length iFile3 = FreeFile Open "c:\employee.dat" For Random As iFile3 Len = Len(strEmp) ' Read the record number 8 and display Get #iFile3, 8, strEmp MsgBox strEmp.Emp_Name & "'s salary is " & _ Format$(strEmp.Salary, "$###,##0.00" ' Close the first file Close iFile1 ' Close all other open files Close
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Reading Files
Use the Input # function to read a file opened in Binary or Input modes using the syntax Input # file_number, var1, var2. You use this function to read a comma-separated list of variables from a file: ' Read three variables from a file Dim myVar1, myVar2, myVar3 Open "c:\myfile.dat" For Input As 1 Input #1, myVar1, myVar2, myVar3 Close 1
Random4 Append8 Binary32 The Input function (without the pound sign), also used with files opened in Binary or Input modes, differs from the Input # function in that it reads in a fixed number of characters from the file, including commas, carriage returns, and line feeds, as shown in the following example: Dim sBuffer As String Open "c:\autoexec.bat" For Input As 1 ' Read 100 characters sBuffer = Input (100, 1) Line Input is used to read an entire line of text, up to (but not including) the character-return and line-feed characters. The following code reads a file and displays each line in a ListBox control. The result is shown in Figure 4.1: Dim sBuffer As String ' Open the file Open "c:\autoexec.bat" For Input As 1 Do While Not EOF(1) Line Input #1, sBuffer ' List1 is an existing ListBox control List1.AddItem sBuffer Loop Close 1
Figure 4.1 Example of using Line Input to populate a ListBox control with the contents of a file. Notice that Line Input requires the use of the pound sign in front of the file number. Consistency is not Microsofts middle name. You use the Get function to read a file opened in Random or Binary mode. With Random mode, you supply a record number, whereas with Binary mode, you supply a byte offset. If the record number or byte offset is omitted, reading begins at the next record or byte in the file. The following statement reads a file that was previously opened randomly: ' Read the 55th record into the strEmpRec variable Get #iFile3, 55, strEmpRec The following statement reads the next 512 bytes into a variable from a file previously opened as Binary:
Navigating Files
The current position in the file is said to be the file pointer. Therefore, if the current record is number 55 in the file, the file pointer is 55. The Seek statement moves the file pointer (that is, it sets the location in the file for the next read or write), whereas the Seek function returns the current file pointer: ' Move to the 30th record Seek #iFile3, 30 ' Read the next record Get #iFile3, , strEmpRec ' Display the current record number (31) MsgBox Seek (#iFile3) The Loc is the counterpart to Seek, returning the location of the last read or write. The counterpart to Get is Put, which writes a record (in Random mode) or the value of a variable (in Binary mode): Put #iFile3,, strEmpRec
Writing To Files
Put has additional functionality that varies somewhat depending on whether the file was opened for Binary or Random. For instance, you can write the contents of an array to disk. See the Visual Basic help file for additional information. The Write # statement writes data to a file in the same format that it is read with the Input # function. Write # places quotes around strings and commas after each variable. A carriage-return and line-feed sequence is written after each Write # operation. Dates are surrounded by pound signs (#December 25, 1998#). Boolean data is written as #TRUE# or #FALSE#. Print # writes formatted data to a sequential file. Strings do not have quotes and variables are not separated by commas. However, a carriage-return and line-feed sequence is written after each Print # operation unless the last character is a semicolon or comma. Print # is useful for writing formatted reports to a disk file for printing later. See the Visual Basic help file for more details on some of the ways that you can use Print #.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
' Read the data Get #1,, sBuffer lRemaining = lRemaining - Len(sBuffer) If lRemaining < 1 Then Exit Do Loop The Visual Basic help file contains information on other functions that may be useful, such as FileDateTime (which returns the date and time when a file was last modified), GetAttr (which returns the attributes of a file such as Hidden or System), and SetAttr (which sets file attributes).
Capabilities
With the exception of ADO, DAO is the most flexible of the data access methods available to the VB developer. It offers the ability to manipulate ISAM databases as well as relational databases, shielding the developer from many complexities and creating a layer where the developer can use a common code base to interact with multiple back ends. In other words, if the developer uses a reasonable amount of planning, he or she can use the same code to interact with both FoxPro and Oracle. The actual implementation of DAO is shown in Figure 4.2.
Figure 4.2 The relationship of DAO to the application and to the database. The original implementation of DAO was with Microsoft Jet, which then interacts with the ODBC Driver Manager. The ODBC Driver Manager handles all low-level interactions with the database itself. Unfortunately, what Jet gains in flexibility (because it can interface to virtually any data source), it loses in performance. Jet is robust in its capabilities but is a thick layer between your applications and the data, slowing down all database access. Also, Jet leaves a large footprint in memory; it occupies more than a megabyte of RAM. The seasoned (or should I say grizzled?) Visual Basic developer who wrote database applications with Visual Basic 3 used Jet version 1.1. This was a primitive implementation. Jets performance enhancements are derived in large part by allowing the back-end database to do what it does best: process queries. Jet 1.1 pulled all of the records from a table in a back-end database such as Oracle and then performed the query locally.
ODBCDirect was added to DAO with version 5 of Visual Basic. It is not as flexible as Jet in that it can only communicate with defined ODBC data sources. However, it bypasses Jet and communicates with RDO. It also offers some functionality improvements because it can access ODBC features not available when going through Jet. Because ODBCDirect is a thin layer leaving a minimal footprint in memory, it offers performance nearly as good as RDO itself. ODBCDirect essentially maps DAO functionality to RDO functionality. The RDO layer interacts with the ODBC Manager. In Chapter 5, I discuss the DAO object hierarchy in depth.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Using DAO
DAO has a fairly complex object hierarchy. (See Figures 5.1 and 5.2 in Chapter 5 for a graphical representation of this hierarchy). In this model, all data access objects exist in the context of a single DBEngine object.
later in this chapter. For new relational database development efforts, I see no compelling reason to use DAO over ADO because OLE DB seems to be a much more robust technology than Jet and especially because Microsoft is clearly going to concentrate its development bucks there. For the typical desktop development effort, I see no reason not to use Jet against ISAM-type databases, including MS Access. If more robustness is needed (such as logging), then I recommend a product such as Personal Oracle or Sybase SQL Anywhere, both of which offer good performance with modest resource requirements. Otherwise, Jet is a proven data engine technology for file-oriented databases. The DBEngine object contains the Workspaces collection, which consists of individual Workspace objects. Each Workspace object describes a current session with the database and includes the Databases collection. Each Database object includes several collections including QueryDefs and RecordSets. A QueryDef object describes an SQL select. Each RecordSet object is an active result set from the database. The actual implementation of the object hierarchy varies a little based on whether ODBCDirect is used. Because the objects themselves vary, there is also some variation in methods and events available to the VB developer. For instance, ODBCDirect offers the Cancel method to cancel a pending asynchronous query. DAO gives the developer the opportunity to use data-aware controls, greatly simplifying the task of DAO-based development. This opportunity comes at the expense of some flexibility in design but can turn 200 lines of code into 20 or fewer. For instance, the Data control automatically connects to the database, builds a RecordSet object, handles the chore of scrolling through the RecordSet, and performs all updates behind the scenes. Without the Data control, the VB developer has to code all of this functionality. Likewise, by having a Data control, the user can bind other data-aware controls such as the TextBox and the ListBox to individual columns in the Data controls RecordSet object. Without the Data control, the developer has to manually populate each TextBox (or other control) as the user scrolls through a RecordSet. Figure 4.3 shows a form module that uses the Data control and several TextBox controls to display records from the Employee table in the Access version of the sample database provided with this book.
Figure 4.3 The Employee application created using the Data control and very little coding.
TIP Using The Sample Code Because the application in Figure 4.3 uses an Access database without ODBC, the path to the database file is hard-coded in the Database name of the Data control. In addition, because the database is on your CD-ROM, the database cannot be updated. To run this code on your own, copy the project file and the form file to your hard drive along with the Access database.
Open the project and change the Database property of the Data control to reflect the new path.
I actually wrote only one line of code (shown below) to display the record number. The rest of the code was all generated as a result of drawing controls on the form and setting their properties as needed. Private Sub Data1_Reposition() Data1.Caption = "Record " & _ Str(Data1.Recordset.AbsolutePosition + 1) & _ " of " & Str(Data1.Recordset.RecordCount) End Sub In Chapter 5, I expand on the use of DAO, adding functionality such as error handling, data validation, and so on using the sample data provided with this book (see Appendix A). I will also walk through the process of coding a DAO-based application without using the Data control, as a means of illustrating the DAO object hierarchy and to provide additional flexibility to the application.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
RDO Capabilities
RDO puts a smaller layer between your application and the data than does DAO. This is shown in Figure 4.4. RDO basically places a thin wrapper around the ODBC API, allowing the developer most of the benefits of the ODBC API with few drawbacks. The chief advantages of RDO are: No need for a local query processor (Jet); query processing is done remotely. Smaller memory footprint. Event-driven asynchronous queries.
Figure 4.4 The Remote Data Objects model. The Visual Basic 5.0 implementation of RDO (RDO 2.0) and the Remote Data control (RDC) was a significant improvement over the original Visual Basic 4.0 implementation (RDO 1.0). For example, under Visual Basic 4.0, you had to continuously poll for the completion of a database operation. RDO 6.0, introduced with Visual Basic 6 (dont ask me what happened to versions 3, 4, and 5), is another improvement. Unlike DAO, in which data access is usually performed synchronously (meaning that the program has to wait for the result set to be returned before processing can continue), RDO permits asynchronous queries of the database. For example, if you set the RDCs Options property to rdAsyncEnable, the ResultSet is populated as a background task. When the query is complete (and the ResultSet completely populated), the QueryCompleted event is fired, providing an event-driven means for your application to know that the query has completed. The RDCs functionality is similar to the Data control, which most VB developers have used. Likewise, the Remote Data Object hierarchy is similar (though simplified) to the Data Access Object hierarchy.
Using RDO
Figure 4.5 shows an application created using the RDC that is almost identical to the DAO example in Figure 4.3. The snippet places the current row number into the Caption property of the RDC. The properties of the RDC are almost identical to those in the Data control. A few differences mostly reflect the SQL row orientation of the Remote Data control versus the file record orientation of the Data control. For instance, DAOs RecordSet property is equivalent to RDOs ResultSet property. DAOs RecordCount property is equivalent to RDOs RowCount property as shown in the following code snippet: Private Sub MSRDC1_Reposition() MSRDC1.Caption = "Record " & _ Str(MSRDC1.Resultset.AbsolutePosition) & _ " of " & Str(MSRDC1.Resultset.RowCount)
End Sub
Figure 4.5 The Employee application coded using the Remote Data control.
VBSQL
VBSQL is a library for accessing Microsoft SQL Server via DB-Lib. Because Microsoft is heading away from DB-Lib and more toward an ODBC- and OLE DB-oriented access method to SQL Server, the use of VBSQL is not recommended except for existing projects already utilizing it.
VBSQL Capabilities
VBSQL is written to provide an interface to Microsoft SQL Server only. Specifically, any functionality exposed by SQL Servers DB-Lib is implemented by VBSQL. Connections tend to be cursor-oriented, and you are pretty much on your own to manually populate controls with information retrieved from the database (as opposed to binding controls to a data source). On the other hand, you have an unusual degree of control (relative to other Visual Basic data models) over the efficiency of your database processing. For instance, you can set the packet size for communicating between the database server and your application using the SQLSetPacket function with the SQLOpen function. (Note that you can determine the current packet size using SQLGetPacket but that once a connection is established, the packet size cannot be changed.) The packet size can be any size up to 64K and, if not specified, is determined by the current SQL Server default. If the default is set to 4,092 bytes but your application is mostly updating and retrieving single records, the large packet size slows down your application. On the other hand, if your application is processing a lot of data, a larger packet size is preferable to reduce the number of disk reads.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
Using VBSQL
To use VBSQL, you need to visit the Microsoft Web site and download the VBSQL ActiveX control (look for VBSQL.OCX). The version I used in this book was a prerelease version, but a shipping version should be available by the time you read this. VBSQL requires the use of the VBSQL control, which is really more like a DLL than a Visual Basic control. You need to add to your project the VBSQL.BAS module, which is also available from the Microsoft Web site and included on the CD-ROM (be sure to check the Web site for a more recent release). The VBSQL.BAS module has a number of function calls defined, such as establishing a connection with the database. The declarations all reference the OCX file as shown in this code snippet: Declare Function SqlNextRow Lib "VBSQL.OCX" _ (ByVal SqlConn As Long) As Long Figure 4.6 shows a VBSQL application running in front of the VB6 development environment. A quick look at the two open code windows reveals that this coding, although not terrifically difficult, is tedious. Because Microsoft is in the process of eliminating the DB-Lib interface, and because the Active Data Object model is available, you have little reason to use VBSQL code. Because of the length of the code, and the fact that I dont recommend VBSQL when clearly better data access models are available, the entire 20 or so pages of code is not printed in the book.
Figure 4.6 The Employee application using VBSQL. To open a connection to the database, a sample code snippet might look like the
following: Dim lSQLConn As Long Dim lRtn As Long Dim sRtn As String ' Share with other procedures Private myLogIn As Long ' Initialize sRtn = SQLInit () ' Establish login record myLogIn = SQLLogIn () ' Set login parameters lRtn = SQLSetLUser (myLogIn, "Coriolis") lRtn = SQLSetLPwd (myLogIn, "Coriolis") ' Workstation lRtn = SQLSetLHost (myLogIn, "W243A") ' Can add other parameters lRtn = SQLSetLApp (myLogIn, "Coriolis VBSQL Sample App") ' Establish connection lSQLConn = SQLOpen (myLogIn, "Home")
Listing 4.3 Code from frmDSNList. Private Sub cmdClose_Click() End End Sub Private Sub cmdGetSources_Click() Dim iRtn As Integer, iDSNLen As Integer, iDescLen As Integer Dim hEnv As Long Dim sDSN As String * 32, sDesc As String * 128 ' Clear the list box List1.Clear ' Allocate env handle iRtn = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, _ hEnv) ' Set environmental variables iRtn = SQLSetEnvAttr(hEnv, SQL_ATTR_ODBC_VERSION, _ SQL_OV_ODBC3, SQL_IS_INTEGER) ' Get first data source iRtn = SQLDataSources(hEnv, SQL_FETCH_FIRST, sDSN, _ Len(sDSN), iDSNLen, sDesc, Len(sDesc), iDescLen) Do While iRtn = SQL_SUCCESS ' Add DSN to listbox List1.AddItem Left$(sDSN, iDSNLen) ' See if there are any more iRtn = SQLDataSources(hEnv, SQL_FETCH_NEXT, sDSN, _ Len(sDSN), iDSNLen, sDesc, Len(sDesc), iDescLen) Loop ' Free the handle iRtn = SQLFreeHandle(SQL_HANDLE_ENV, hEnv) ' Report the results Text1.Text = Str$(List1.ListCount) End Sub
Figure 4.7 Listing the ODBC data sources using the ODBC API.
The project, included on the CD-ROM, consists of a general code module (ODBC.BAS) containing API declarations and constants and a form module, frmDSNList.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
ADO Capabilities
ADO is actually an interface to OLE DB, which provides a common access point for both relational and nonrelational data structures as well as such disparate and nonstructured data sources as text, graphics, and email. In fact, with OLE DB, you can relate two entirely separate data sources to each other as long as there is an OLE DB driver written for each data source. For example, you could read an Employee table in an Oracle, Access, or DB2 database. You could then join the Emp_Email column in that table to the Pop3_Account_Name column in a Microsoft Outlook data source (when and if an OLE DB driver is written to access Outlook). Whereas ADO provides a high-level interface to the data, OLE DB is a low-level interface. By and large, VB developers can ignore the dirty business of bits and bytes in which OLE DB deals and concentrate on the streamlined interface provided by ADO. ADO encapsulates OLE DB functionality (and, by extension, data) and exposes it as objects. When dealing with OLE DB data sources, OLE DB is considered the data provider, whereas your application (and the Active Data Objects) is considered
Figure 4.8 The relationship between the Orders, LineItem, and Item tables. Now, consider a selection from the Orders table along with all related rows in the LineItem and Item tables. This data hierarchy can be modeled using hierarchical cursors in the new Data Environment Designer. You can build a DataEnvironment object that encapsulates this data hierarchy. Then, you can assign to a data-aware controls DataSource property a DataEnvironment object instead of assigning a Data control to it. When you assign a DataEnvironment object to a data-aware control, the DataEnvironment object has certain properties that are exposed as data members. In Figure 4.9, I am creating a DataEnvironment object using the Data Environment Designer. After creating a DataEnvironment object, you can insert stored procedures previously created on the database, or you can add new Connection objects or Command objects. A Connection object represents a connection to the database, including various parameters such as user ID, password, and timeout. A Command object can be a table similar to a DAO RecordSource type, an existing SQL view or stored procedure, or an SQL statement. In Figure 4.9, I am creating an SQL statement type Command.
Figure 4.9 Creating the deADO-Example DataEnvironment object using the Data Environment Designer.
When placing a data-aware control on a form, you can set its DataSource property to any valid DataEnvironment object. Because the DataEnvironment object is a project-level object (as opposed to a form-level object, such as a Data control), any control on any form may access it. Once you have selected a DataEnvironment object as a controls DataSource property, you can use the controls DataMember to select from any of the Command objects of the DataEnvironment object. In Figure 4.10, I have just selected my deADOExample DataEnvironment object as the DataSource for the DataGrid control. I then selected the Ord_Rpt Command object as the DataMember property.
Figure 4.10 Using a DataEnvironment object as a DataSource of a data-bound control. Of course, none of this is to say that you cant use the ADO Active Data control (ADC) as you would the DAO Data control or the RDO Remote Data control. In Figure 4.11, I do exactly that. The figure shows two forms open inside an MDIForm. The top form shows the form that was designed in Figure 4.10, displaying rows of data from the three tables represented in Figure 4.8. The bottom form shows the same Employee Maintenance form that I created using DAO and RDO earlier in this chapter.
Figure 4.11 An MDIForm housing two other forms using a DataGrid control bound to a DataEnvironment object and some TextBox controls bound to an ADO Data control.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------
The code for this application is included on the CD-ROM. To build it yourself, first create an MDI form named mdiADOExamples. Add two forms named frmADOEmployee and frmADOReport. Set both of their MDIChild properties to True. On the MDI form, add a File menu with three items: Report, Employee, and Exit. For the mnuFile-Report_Click event, code the following: frmADOReport.Show In the mnuFileEmployee_Click event, code the following: frmADOEmployee.Show On frmADOEmployee, add an ActiveData control. To do so, select Microsoft Active Data Objects from the Projects|Components menu to put the control on the toolbox. Set its properties as follows: Align=2 Align Bottom; Connect=DSN=Coriolis VB Example; DataSource Name=Coriolis VB Example; RecordSource=employee. Next, create an array of five textbox controls named txtEmpDat. The DataSource property for each should be Adodoc1. Set their DataField properties to: emp_no; emp_fname; emp_lname; emp_hire_date; and emp_salary. You can set the DataFormat properties for the date and salary fields as you see fit. Add label controls as shown in Figure 4.11 and then add the following code: Private Sub Adodoc1_MoveComplete (ByVal adReason As _ EventReasonEnum, ByVal pError As Error, adStatus _ As EventStatusEnum, ByVal pRecordSet As RecordSet) Adodc1.Caption = "Record " & _ Str$(Adodc1.RecordSet.AbsolutePosition) & " of " & _
Str$(Adodc1.RecordSet.RecordCount) End Sub On frmADOReport, add a DataGrid control named DataGrid1. Set the DataGrid controls properties as follows: AllowArrows=True; AllowSizing=True; Caption=Order Details; DataSource=deADOExample; DataMember=Ord_Rpt; WrapText=True. The DataField property for the columns should be set to: ord_no; ord_date; ord_cust_no; line_no; line_item_no; item_desc; line_qty; line_price; line_total. The Caption and Format properties should be set appropriately, such as those shown in Figure 4.11. deADOExample has one Connection named CorVBExample. Its SourceOfData property should be set to 3 - deUseOLEDBConnect-String. The ConnectionSource property should be set to Provider=MSDASQL.1; Password=coriolis; User ID=coriolis; DataSource=Coriolis VB Example. The Connection object has two Command objects named Orders_Only and Ord_Rpt. For Orders_Only, set the CommandType property to 2 - AdCmdTable. Set other properties as follows: CommandText=Coriolis.Orders; CursorLocation=3 - adUseClient; CursorType=3 - adUseStatic; and LockType=1 - adLockReadOnly. For the Ord_Rpt Command object, set properties as follows: CommandType=1 - adCmdText; CommandText=SELECT orders.ord_no, orders.ord_date, orders.ord_cust_no, line_item.line_no, line_item.line_item_no, item.item_desc, line_item.line_qty, line_item.line_price, line_item.line_total FROM item, line_item, orders WHERE item.item_no=line_item.line_item_no AND line_item.line_ord_no=orders.ord_no ORDER BY orders.ord_no, line_item.line_no; CursorLocation=3 - adUseClient; Cursor-Type= 3 adUseStatic; and LockType=1 - adLockReadOnly. In the CD-ROM code listing, you will see other Command objects as well, though they are not used in the sample application. Each Command object exists in the context of a Connection object. The Connection object must first exist, and it must be specified as the parent of the Command object. This brings us to the simplified (and non-hierarchical) object model of ADO relative to DAO and RDO, as shown in Figure 4.12. (Note that the figure itself is somewhat simplified. For instance, Errors is actually a collection of Error objects.)
Figure 4.12 The ADO data model. One of the most impressive and useful aspects of ADO and VB6 is that you can create a DataEnvironment object and use it on multiple forms (indeed, in any module). Contrast that to the Data control or the Remote Data control, which must be re-created for each form that will use them. This change is a leap
forward in the continued evolution of Visual Basic into an object-oriented development tool.
Products | Contact Us | About Us | Privacy | Ad Info | Home Use of this site is subject to certain Terms & Conditions, Copyright 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.
(Publisher: The Coriolis Group) Author(s): Michael MacDonald and Kurt Cagle ISBN: 1576102823 Publication Date: 10/01/98
-----------