http://www.tobari-kaikei.com/cgi-bin/login.cgi How to Think Like a Computer Scientist Learning with Python ii How to Think Like a Computer Scientist Learning with Python Allen Downey Jeffrey Elkner Chris Meyers Green Tea Press Wellesley, Massachusetts Copyrightc 2002 Allen Downey, Jeffrey Elkner, and Chris Meyers. Edited by Shannon Turlington and Lisa Cutler. Cover design by Rebecca Gimenez. Printing history: April 2002: First edition. August 2008: Second printing. Green Tea Press 1 Grove St. P.O. Box 812901 Wellesley, MA 02482 Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being ¡ÈForeword,¡É ¡ÈPreface,¡Éand ¡ÈContributor List,¡É with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the appendix entitled ¡ÈGNU Free Documentation License.¡É The GNU Free Documentation License is available from www.gnu.org or by writing to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307,USA. The original form of this book is LATEX source code. Compiling this LATEX source has he effect of generating a device-independent representation of a textbook, which can be converted to other formats and printed. The LATEX source for this book is available from http://www.thinkpython.com Publisher¡Çs Cataloging-in-Publication (provided by Quality Books, Inc.) Downey, Allen How to think like a computer scientist : learning with Python / Allen Downey, Jeffrey Elkner, Chris Meyers. – 1st ed. p. cm. Includes index. ISBN 0-9716775-0-6 LCCN 2002100618 1. Python (Computer program language) I. Elkner, Jeffrey. II. Meyers, Chris. III. Title QA76.73.P98D69 2002 005.13¡Ç3 QBI02-200031 Foreword By David Beazley As an educator, researcher, and book author, I am delighted to see the completion of this book. Python is a fun and extremely easy-to-use programming language that has steadily gained in popularity over the last few years. Developed over ten years ago by Guido van Rossum, Python¡Çs simple syntax and overall feel is largely derived from ABC, a teaching language that was developed in the 1980¡Çs. However, Python was also created to solve real problems and it borrows a wide variety of features from programming languages such as C++, Java, Modula-3,and Scheme. Because of this, one of Python¡Çs most remarkable features is its broad appeal to professional software developers, scientists, researchers, artists,and educators. Despite Python¡Çs appeal to many different communities, you may still wonder ¡Èwhy Python?¡É or ¡Èwhy teach programming with Python?¡É Answering these questions is no simple task—especially when popular opinion is on the side of more masochistic alternatives such as C++ and Java. However, I think the most direct answer is that programming in Python is simply a lot of fun and moreproductive. When I teach computer science courses, I want to cover important concepts in addition to making the material interesting and engaging to students. Unfortunately,there is a tendency for introductory programming courses to focus far too much attention on mathematical abstraction and for students to become frustrated with annoying problems related to low-level details of syntax, compilation, and the enforcement of seemingly arcane rules. Although such abstraction and formalism is important to professional software engineers and students who plan to continue their study of computer science, taking such an approach in an introductory course mostly succeeds in making computer science boring. When I teach a course, I don¡Çt want to have a room of uninspired students. I would much rather see them trying to solve interesting problems by exploring different ideas, taking unconventional approaches, breaking the rules, and learning from their mistakes. vi Foreword In doing so, I don¡Çt want to waste half of the semester trying to sort out obscure syntax problems, unintelligible compiler error messages, or the several hundred ways that a program might generate a general protection fault. One of the reasons why I like Python is that it provides a really nice balance between the practical and the conceptual. Since Python is interpreted, beginners can pick up the language and start doing neat things almost immediately without getting lost in the problems of compilation and linking. Furthermore, Python comes with a large library of modules that can be used to do all sorts of tasks ranging from web-programming to graphics. Having such a practical focus is a great way to engage students and it allows them to complete significant projects. However, Python can also serve as an excellent foundation for introducing important computer science concepts. Since Python fully supports procedures and classes, students can be gradually introduced to topics such as procedural abstraction, data structures, and object-oriented programming—all of which are applicable to later courses on Java or C++. Python even borrows a number of features from functional programming languages and can be used to introduce concepts that would be covered in more detail in courses on Scheme and Lisp. In reading Jeffrey¡Çs preface, I am struck by his comments that Python allowed him to see a ¡Èhigher level of success and a lower level of frustration¡É and that he was able to ¡Èmove faster with better results.¡É Although these comments refer to his introductory course, I sometimes use Python for these exact same reasons in advanced graduate level computer science courses at the University of Chicago In these courses, I am constantly faced with the daunting task of covering a lot of difficult course material in a blistering nine week quarter. Although it is certainly possible for me to inflict a lot of pain and suffering by using a language like C++, I have often found this approach to be counterproductive—especially when the course is about a topic unrelated to just ¡Èprogramming.¡É I find that using Python allows me to better focus on the actual topic at hand while allowing students to complete substantial class projects. Although Python is still a young and evolving language, I believe that it has a bright future in education. This book is an important step in that direction. David Beazley University of Chicago Author of the Python Essential Reference
Preface
By Jeff Elkner This book owes its existence to the collaboration made possible by the Internet and the free software movement. Its three authors—a college professor, a high school teacher, and a professional programmer—have yet to meet face to face, but we have been able to work closely together and have been aided by many wonderful folks who have donated their time and energy to helping make this book better. We think this book is a testament to the benefits and future possibilities of this kind of collaboration, the framework for which has been put in place by Richard Stallman and the Free Software Foundation. How and why I came to use Python In 1999, the College Board¡Çs Advanced Placement (AP) Computer Science exam was given in C++ for the first time. As in many high schools throughout the country, the decision to change languages had a direct impact on the computer science curriculum at Yorktown High School in Arlington, Virginia, where I teach Up to this point, Pascal was the language of instruction in both our first-year and AP courses. In keeping with past practice of giving students two years of exposure to the same language, we made the decision to switch to C++ in the first-year course for the 1997-98 school year so that we would be in step with the College¡¡Board¡Çs change for the AP course the following year. Two years later, I was convinced that C++ was a poor choice to use for introducing students to computer science. While it is certainly a very powerful programming language, it is also an extremely difficult language to learn and teach. I found myself constantly fighting with C++¡Çs difficult syntax and multiple ways of doing things, and I was losing many students unnecessarily as a result. Convinced there viii Preface¡¡had to be a better language choice for our first-year class, I went looking for an alternative to C++.¡¡I needed a language that would run on the machines in our Linux lab as well as on¡¡the Windows and Macintosh platforms most students have at home. I wanted it to¡¡be free and available electronically, so that students could use it at home regardless of their income. I wanted a language that was used by professional programmers,¡¡and one that had an active developer community around it. It had to support¡¡both procedural and object-oriented programming. And most importantly, it had¡¡to be easy to learn and teach. When I investigated the choices with these goals¡¡in mind, Python stood out as the best candidate for the job. I asked one of Yorktown¡Çs talented students, Matt Ahrens, to give Python a try. In two months he not only learned the language but wrote an application called¡¡pyTicket that enabled our staff to report technology problems via theWeb. I knew¡¡that Matt could not have finished an application of that scale in so short a time¡¡in C++, and this accomplishment, combined with Matt¡Çs positive assessment of¡¡Python, suggested that Python was the solution I was looking for. Finding a textbook Having decided to use Python in both of my introductory computer science classes¡¡the following year, the most pressing problem was the lack of an available textbook. Free content came to the rescue. Earlier in the year, Richard Stallman had introduced¡¡me to Allen Downey. Both of us had written to Richard expressing an¡¡interest in developing free educational content. Allen had already written a firstyear¡¡computer science textbook, How to Think Like a Computer Scientist. When¡¡I read this book, I knew immediately that I wanted to use it in my class. It was¡¡the clearest and most helpful computer science text I had seen. It emphasized¡¡the processes of thought involved in programming rather than the features of a particular language. Reading it immediately made me a better teacher. How to Think Like a Computer Scientist was not just an excellent book, but it¡¡had been released under a GNU public license, which meant it could be used¡¡freely and modified to meet the needs of its user. Once I decided to use Python,¡¡it occurred to me that I could translate Allen¡Çs original Java version of the book¡¡into the new language. While I would not have been able to write a textbook on¡¡my own, having Allen¡Çs book to work from made it possible for me to do so, at the¡¡same time demonstrating that the cooperative development model used so well in software could also work for educational content. Working on this book for the last two years has been rewarding for both my¡¡students and me, and my students played a big part in the process. Since I could¡¡ ix make instant changes whenever someone found a spelling error or difficult passage,¡¡I encouraged them to look for mistakes in the book by giving them a bonus point¡¡each time they made a suggestion that resulted in a change in the text. This had¡¡the double benefit of encouraging them to read the text more carefully and of¡¡getting the text thoroughly reviewed by its most important critics, students using it to learn computer science. For the second half of the book on object-oriented programming, I knew that¡¡someone with more real programming experience than I had would be needed to¡¡do it right. The book sat in an unfinished state for the better part of a year¡¡until the free software community once again provided the needed means for its¡¡completion. I received an email from Chris Meyers expressing interest in the book. Chris is¡¡a professional programmer who started teaching a programming course last year¡¡using Python at Lane Community College in Eugene, Oregon. The prospect of¡¡teaching the course had led Chris to the book, and he started helping out with it¡¡immediately. By the end of the school year he had created a companion project¡¡on our website at¡¡http://www.ibiblio.org/obp called Python for Fun and was¡¡working with some of my most advanced students as a master teacher, guiding them beyond where I could take them. Introducing programming with Python The process of translating and using How to Think Like a Computer Scientist¡¡for the past two years has confirmed Python¡Çs suitability for teaching beginning¡¡students. Python greatly simplifies programming examples and makes important¡¡programming ideas easier to teach. The first example from the text illustrates this point. It is the traditional ¡Èhello,world¡É program, which in the C++ version of the book looks like this: #include <iostream.h> void main() { cout << "Hello, world." << endl; } in the Python version it becomes: print "Hello, World!" Even though this is a trivial example, the advantages of Python stand out. Yorktown¡Çs¡¡Computer Science I course has no prerequisites, so many of the students x Preface seeing this example are looking at their first program. Some of them are undoubtedly¡¡a little nervous, having heard that computer programming is difficult to learn. The C++ version has always forced me to choose between two unsatisfying options: either to explain #include, void main(), {, and }, and risk confusing or intimidating some of the students right at the start, or to tell them, ¡ÈJust don¡Çt¡¡worry about all of that stuff now; we will talk about it later,¡É and risk the same¡¡thing. The educational objectives at this point in the course are to introduce¡¡students to the idea of a programming language and to get them to write their¡¡first program, thereby introducing them to the programming environment. The¡¡Python program has exactly what is needed to do these things, and nothing more. Comparing the explanatory text of the program in each version of the book further¡¡illustrates what this means to the beginning student. There are thirteen¡¡paragraphs of explanation of ¡ÈHello, world!¡É in the C++ version; in the Python¡¡version, there are only two. More importantly, the missing eleven paragraphs do¡¡not deal with the ¡Èbig ideas¡É in computer programming but with the minutia of¡¡C++ syntax. I found this same thing happening throughout the book. Whole¡¡paragraphs simply disappear from the Python version of the text because Python¡Çs¡¡much clearer syntax renders them unnecessary. Using a very high-level language like Python allows a teacher to postpone talking¡¡about low-level details of the machine until students have the background that¡¡they need to better make sense of the details. It thus creates the ability to put¡¡¡Èfirst things first¡É pedagogically. One of the best examples of this is the way in¡¡which Python handles variables. In C++ a variable is a name for a place that¡¡holds a thing. Variables have to be declared with types at least in part because¡¡the size of the place to which they refer needs to be predetermined. Thus, the¡¡idea of a variable is bound up with the hardware of the machine. The powerful and fundamental concept of a variable is already difficult enough for beginning¡¡students (in both computer science and algebra). Bytes and addresses do not help¡¡the matter. In Python a variable is a name that refers to a thing. This is a far¡¡more intuitive concept for beginning students and is much closer to the meaning of ¡Èvariable¡É that they learned in their math courses. I had much less difficulty teaching variables this year than I did in the past, and I spent less time helping students with problems using them. Another example of how Python aids in the teaching and learning of programming is in its syntax for functions. My students have always had a great deal of difficulty understanding functions. The main problem centers around the difference between¡¡a function definition and a function call, and the related distinction between a¡¡parameter and an argument. Python comes to the rescue with syntax that is¡¡nothing short of beautiful. Function definitions begin with the keyword def, so I simply tell my students, ¡ÈWhen you define a function, begin with def, followed by¡¡the name of the function that you are defining; when you call a function, simply xi call (type) out its name.¡É Parameters go with definitions; arguments go with calls¡¡There are no return types, parameter types, or reference and value parameters to¡¡get in the way, so I am now able to teach functions in less than half the time that¡¡it previously took me, with better comprehension¡¡Using Python has improved the effectiveness of our computer science program for¡¡all students. I see a higher general level of success and a lower level of frustration¡¡than I experienced during the two years I taught C++. I move faster with better¡¡results. More students leave the course with the ability to create meaningful programs and with the positive attitude toward the experience of programming¡¡that this engenders¡¡Building a community I have received email from all over the globe from people using this book to learn or¡¡to teach programming. A user community has begun to emerge, and many people¡¡have been contributing to the project by sending in materials for the companion¡¡websiteat http://www.thinkpython.com. With the publication of the book in print form, I expect the growth in the user¡¡community to continue and accelerate. The emergence of this user community and¡¡the possibility it suggests for similar collaboration among educators have been the¡¡most exciting parts of working on this project for me. By working together, we¡¡can increase the quality of materials available for our use and save valuable timeI invite you to join our community and look forward to hearing from you. Please write to the authors at feedback@thinkpython.com. Jeffrey Elkner Yorktown High School Arlington, Virginia xii Preface
Contributor List
To paraphrase the philosophy of the Free Software Foundation, this book is free¡¡like free speech, but not necessarily free like free pizza. It came about because of¡¡a collaboration that would not have been possible without the GNU Free Documentation¡¡License. So we thank the Free Software Foundation for developing this¡¡license and, of course, making it available to us. We also thank the more than 100 sharp-eyed and thoughtful readers who have¡¡sent us suggestions and corrections over the past few years. In the spirit of free¡¡software, we decided to express our gratitude in the form of a contributor list¡¡Unfortunately, this list is not complete, but we are doing our best to keep it up¡¡to date. If you have a chance to look through the list, you should realize that each person¡¡here has spared you and all subsequent readers from the confusion of a technical¡¡error or a less-than-transparent explanation, just by sending us a note. Impossible as it may seem after so many corrections, there may still be errors¡¡in this book. If you should stumble across one, please check the online version¡¡of the book at http://thinkpython.com, which is the most up-to-date version. If the error has not been corrected, please take a minute to send us email at¡¡feedback@thinkpython.com. If we make a change due to your suggestion, you will¡¡appear in the next version of the contributor list (unless you ask to be omitted). Thank you! • Lloyd Hugh Allen sent in a correction to Section 8.4. • Yvon Boulianne sent in a correction of a semantic error in Chapter 5. • Fred Bremmer submitted a correction in Section 2.1. • Jonah Cohen wrote the Perl scripts to convert the LaTeX source for this¡¡book into beautiful HTML. xiv Contributor List • Michael Conlon sent in a grammar correction in Chapter 2 and an improvement¡¡in style in Chapter 1, and he initiated discussion on the technical¡¡aspects of interpreters. • Benoit Girard sent in a correction to a humorous mistake in Section 5.6. • Courtney Gleason and Katherine Smith wrote horsebet.py, which was used¡¡as a case study in an earlier version of the book. Their program can now be¡¡found on the website. • Lee Harr submitted more corrections than we have room to list here, and¡¡indeed he should be listed as one of the principal editors of the text. • James Kaylin is a student using the text. He has submitted numerous corrections. • David Kershaw fixed the broken catTwice function in Section 3.10. • Eddie Lam has sent in numerous corrections to Chapters 1, 2, and 3. He¡¡also fixed the Makefile so that it creates an index the first time it is run and¡¡helped us set up a versioning scheme. • Man-Yong Lee sent in a correction to the example code in Section 2.4. • David Mayo pointed out that the word ¡Èunconsciously¡É in Chapter 1 needed¡¡to be changed to ¡Èsubconsciously¡É. • Chris McAloon sent in several corrections to Sections 3.9 and 3.10. • Matthew J. Moelter has been a long-time contributor who sent in numerous corrections and suggestions to the book. • Simon Dicon Montford reported a missing function definition and several¡¡typos in Chapter 3. He also found errors in the increment function in¡¡Chapter 13. • John Ouzts corrected the definition of ¡Èreturn value¡É in Chapter 3. • Kevin Parks sent in valuable comments and suggestions as to how to improve¡¡the distribution of the book. • David Pool sent in a typo in the glossary of Chapter 1, as well as kind words¡¡of encouragement. • Michael Schmitt sent in a correction to the chapter on files and exceptions. • Robin Shaw pointed out an error in Section 13.1, where the printTime function¡¡was used in an example without being defined. xv • Paul Sleigh found an error in Chapter 7 and a bug in Jonah Cohen¡Çs Perl¡¡script that generates HTML from LaTeX. • Craig T. Snydal is testing the text in a course at Drew University. He has¡¡contributed several valuable suggestions and corrections. • Ian Thomas and his students are using the text in a programming course. They are the first ones to test the chapters in the latter half of the book,¡¡and they have made numerous corrections and suggestions. • Keith Verheyden sent in a correction in Chapter 3. • Peter Winstanley let us know about a longstanding error in our Latin in Chapter 3. • Chris Wrobel made corrections to the code in the chapter on file I/O and¡¡exceptions. • Moshe Zadka has made invaluable contributions to this project. In addition¡¡to writing the first draft of the chapter on Dictionaries, he provided continual¡¡guidance in the early stages of the book. • Christoph Zwerschke sent several corrections and pedagogic suggestions, and¡¡explained the difference between gleich and selbe. • James Mayer sent us a whole slew of spelling and typographical errors,¡¡including two in the contributor list. • Hayden McAfee caught a potentially confusing inconsistency between two¡¡examples. • Angel Arnal is part of an international team of translators working on the¡¡Spanish version of the text. He has also found several errors in the English¡¡version. • Tauhidul Hoque and Lex Berezhny created the illustrations in Chapter 1¡¡and improved many of the other illustrations. • Dr. Michele Alzetta caught an error in Chapter 8 and sent some interesting¡¡pedagogic comments and suggestions about Fibonacci and Old Maid. • Andy Mitchell caught a typo in Chapter 1 and a broken example in Chapter¡¡2. • Kalin Harvey suggested a clarification in Chapter 7 and caught some typos. • Christopher P. Smith caught several typos and is helping us prepare to¡¡update the book for Python 2.2. xvi Contributor List • David Hutchins caught a typo in the Foreword. • Gregor Lingl is teaching Python at a high school in Vienna, Austria. He is working on a German translation of the book, and he caught a couple of¡¡bad errors in Chapter 5. • Julie Peters caught a typo in the Preface. • Florin Oprina sent in an improvement in makeTime, a correction in printTime, and a nice typo. • D. J. Webre suggested a clarification in Chapter 3. • Ken found a fistful of errors in Chapters 8, 9 and 11. • Ivo Wever caught a typo in Chapter 5 and suggested a clarification in Chapter3. • Curtis Yanko suggested a clarification in Chapter 2. • Ben Logan sent in a number of typos and problems with translating the¡¡book into HTML. • Jason Armstrong saw the missing word in Chapter 2. • Louis Cordier noticed a spot in Chapter 16 where the code didn¡Çt match the text. • Brian Cain suggested several clarifications in Chapters 2 and 3. • Rob Black sent in a passel of corrections, including some changes for Python2.2. • Jean-Philippe Rey at Ecole Centrale Paris sent a number of patches, including¡¡some updates for Python 2.2 and other thoughtful improvements. • Jason Mader at George Washington University made a number of useful¡¡suggestions and corrections. • Jan Gundtofte-Bruun reminded us that ¡Èa error¡É is an error. • Abel David and Alexis Dinno reminded us that the plural of ¡Èmatrix¡É is ¡Èmatrices¡É, not ¡Èmatrixes¡É. This error was in the book for years, but two¡¡readers with the same initials reported it on the same day. Weird. • Charles Thayer encouraged us to get rid of the semi-colons we had put at¡¡the ends of some statements and to clean up our use of ¡Èargument¡É and¡Èparameter¡É. xvii • Roger Sperberg pointed out a twisted piece of logic in Chapter 3. • Sam Bull pointed out a confusing paragraph in Chapter 2. • Andrew Cheung pointed out two instances of ¡Èuse before def.¡É • Hans Batra found an error in Chapter 16. • Chris Seberino suggested some improvements in the Preface. • Yuri Takhteyev pointed out a problem with single and double quotes. xviii Contributor List Contents Foreword v Preface vii Contributor List xiii
1 The way of the program 1 1.1 The Python programming language . . . . . . . . . . . . . . . . . 1 1.2 What is a program? . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 What is debugging? . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Formal and natural languages . . . . . . . . . . . . . . . . . . . . 6 1.5 The first program . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.6 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Variables, expressions and statements 11 2.1 Values and types . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Variable names and keywords . . . . . . . . . . . . . . . . . . . . 13 2.4 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 Evaluating expressions . . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Operators and operands . . . . . . . . . . . . . . . . . . . . . . . 17 xx Contents 2.7 Order of operations . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.8 Operations on strings . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.9 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.10 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.11 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3 Functions 23 3.1 Function calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Type conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Type coercion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4 Math functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Adding new functions . . . . . . . . . . . . . . . . . . . . . . . . 26 3.7 Definitions and use . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.8 Flow of execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.9 Parameters and arguments . . . . . . . . . . . . . . . . . . . . . . 30 3.10 Variables and parameters are local . . . . . . . . . . . . . . . . . 31 3.11 Stack diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.12 Functions with results . . . . . . . . . . . . . . . . . . . . . . . . 33 3.13 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4 Conditionals and recursion 37 4.1 The modulus operator . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Boolean expressions . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3 Logical operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 Conditional execution . . . . . . . . . . . . . . . . . . . . . . . . 39 4.5 Alternative execution . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.6 Chained conditionals . . . . . . . . . . . . . . . . . . . . . . . . . 40 Contents xxi 4.7 Nested conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.8 The return statement . . . . . . . . . . . . . . . . . . . . . . . . 42 4.9 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.10 Stack diagrams for recursive functions . . . . . . . . . . . . . . . 44 4.11 Infinite recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.12 Keyboard input . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.13 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5 Fruitful functions 49 5.1 Return values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2 Program development . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.4 Boolean functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.5 More recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.6 Leap of faith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.7 One more example . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.8 Checking types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.9 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6 Iteration 61 6.1 Multiple assignment . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 The while statement . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.4 Two-dimensional tables . . . . . . . . . . . . . . . . . . . . . . . 66 6.5 Encapsulation and generalization . . . . . . . . . . . . . . . . . . 67 6.6 More encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.7 Local variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.8 More generalization . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.9 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.10 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 xxii Contents 7 Strings 73 7.1 A compound data type . . . . . . . . . . . . . . . . . . . . . . . . 73 7.2 Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.3 Traversal and the for loop . . . . . . . . . . . . . . . . . . . . . . 74 7.4 String slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.5 String comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.6 Strings are immutable . . . . . . . . . . . . . . . . . . . . . . . . 77 7.7 A find function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.8 Looping and counting . . . . . . . . . . . . . . . . . . . . . . . . 78 7.9 The string module . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.10 Character classification . . . . . . . . . . . . . . . . . . . . . . . . 80 7.11 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 8 Lists 83 8.1 List values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 8.2 Accessing elements . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8.3 List length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.4 List membership . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.5 Lists and for loops . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.6 List operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.7 List slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.8 Lists are mutable . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.9 List deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.10 Objects and values . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.11 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.12 Cloning lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.13 List parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.14 Nested lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Contents xxiii 8.15 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.16 Strings and lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.17 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9 Tuples 97 9.1 Mutability and tuples . . . . . . . . . . . . . . . . . . . . . . . . 97 9.2 Tuple assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 9.3 Tuples as return values . . . . . . . . . . . . . . . . . . . . . . . . 99 9.4 Random numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 9.5 List of random numbers . . . . . . . . . . . . . . . . . . . . . . . 100 9.6 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.7 Many buckets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 9.8 A single-pass solution . . . . . . . . . . . . . . . . . . . . . . . . . 104 9.9 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10 Dictionaries 107 10.1 Dictionary operations . . . . . . . . . . . . . . . . . . . . . . . . . 108 10.2 Dictionary methods . . . . . . . . . . . . . . . . . . . . . . . . . . 109 10.3 Aliasing and copying . . . . . . . . . . . . . . . . . . . . . . . . . 110 10.4 Sparse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 10.5 Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 10.6 Long integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 10.7 Counting letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 10.8 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 11 Files and exceptions 117 11.1 Text files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 11.2 Writing variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 11.3 Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 xxiv Contents 11.4 Pickling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 11.5 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 11.6 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 12 Classes and objects 129 12.1 User-defined compound types . . . . . . . . . . . . . . . . . . . . 129 12.2 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 12.3 Instances as arguments . . . . . . . . . . . . . . . . . . . . . . . . 131 12.4 Sameness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 12.5 Rectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 12.6 Instances as return values . . . . . . . . . . . . . . . . . . . . . . 134 12.7 Objects are mutable . . . . . . . . . . . . . . . . . . . . . . . . . 134 12.8 Copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 12.9 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 13 Classes and functions 139 13.1 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 13.2 Pure functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 13.3 Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 13.4 Which is better? . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 13.5 Prototype development versus planning . . . . . . . . . . . . . . 143 13.6 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 13.7 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 13.8 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 14 Classes and methods 147 14.1 Object-oriented features . . . . . . . . . . . . . . . . . . . . . . . 147 14.2 printTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 14.3 Another example . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Contents xxv 14.4 A more complicated example . . . . . . . . . . . . . . . . . . . . 150 14.5 Optional arguments . . . . . . . . . . . . . . . . . . . . . . . . . . 151 14.6 The initialization method . . . . . . . . . . . . . . . . . . . . . . 152 14.7 Points revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 14.8 Operator overloading . . . . . . . . . . . . . . . . . . . . . . . . . 154 14.9 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 14.10 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 15 Sets of objects 159 15.1 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 15.2 Card objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 15.3 Class attributes and the str method . . . . . . . . . . . . . . 161 15.4 Comparing cards . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 15.5 Decks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 15.6 Printing the deck . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 15.7 Shuffling the deck . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 15.8 Removing and dealing cards . . . . . . . . . . . . . . . . . . . . . 166 15.9 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 16 Inheritance 169 16.1 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 16.2 A hand of cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 16.3 Dealing cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 16.4 Printing a Hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 16.5 The CardGame class . . . . . . . . . . . . . . . . . . . . . . . . . . 172 16.6 OldMaidHand class . . . . . . . . . . . . . . . . . . . . . . . . . . 173 16.7 OldMaidGame class . . . . . . . . . . . . . . . . . . . . . . . . . . 175 16.8 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 xxvi Contents 17 Linked lists 181 17.1 Embedded references . . . . . . . . . . . . . . . . . . . . . . . . . 181 17.2 The Node class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 17.3 Lists as collections . . . . . . . . . . . . . . . . . . . . . . . . . . 183 17.4 Lists and recursion . . . . . . . . . . . . . . . . . . . . . . . . . . 184 17.5 Infinite lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 17.6 The fundamental ambiguity theorem . . . . . . . . . . . . . . . . 186 17.7 Modifying lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 17.8 Wrappers and helpers . . . . . . . . . . . . . . . . . . . . . . . . 187 17.9 The LinkedList class . . . . . . . . . . . . . . . . . . . . . . . . 188 17.10 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 17.11 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 18 Stacks 191 18.1 Abstract data types . . . . . . . . . . . . . . . . . . . . . . . . . 191 18.2 The Stack ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 18.3 Implementing stacks with Python lists . . . . . . . . . . . . . . . 192 18.4 Pushing and popping . . . . . . . . . . . . . . . . . . . . . . . . . 193 18.5 Using a stack to evaluate postfix . . . . . . . . . . . . . . . . . . 194 18.6 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 18.7 Evaluating postfix . . . . . . . . . . . . . . . . . . . . . . . . . . 195 18.8 Clients and providers . . . . . . . . . . . . . . . . . . . . . . . . . 196 18.9 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 19 Queues 199 19.1 The Queue ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 19.2 Linked Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 19.3 Performance characteristics . . . . . . . . . . . . . . . . . . . . . 201 Contents xxvii 19.4 Improved Linked Queue . . . . . . . . . . . . . . . . . . . . . . . 201 19.5 Priority queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 19.6 The Golfer class . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 19.7 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 20 Trees 207 20.1 Building trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 20.2 Traversing trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 20.3 Expression trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 20.4 Tree traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 20.5 Building an expression tree . . . . . . . . . . . . . . . . . . . . . 212 20.6 Handling errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 20.7 The animal tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 20.8 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 A Debugging 221 A.1 Syntax errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 A.2 Runtime errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 A.3 Semantic errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 B Creating a new data type 231 B.1 Fraction multiplication . . . . . . . . . . . . . . . . . . . . . . . . 232 B.2 Fraction addition . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 B.3 Euclid¡Çs algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 234 B.4 Comparing fractions . . . . . . . . . . . . . . . . . . . . . . . . . 235 B.5 Taking it further . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 B.6 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 C Recommendations for further reading 239 C.1 Python-related web sites and books . . . . . . . . . . . . . . . . . 240 C.2 Recommended general computer science books . . . . . . . . . . 241 xxviii Contents
Chapter 1
The way of the program The goal of this book is to teach you to think like a computer scientist. This way¡¡of thinking combines some of the best features of mathematics, engineering, and¡¡natural science. Like mathematicians, computer scientists use formal languages¡¡to denote ideas (specifically computations). Like engineers, they design things, assembling components into systems and evaluating tradeoffs among alternatives. Like scientists, they observe the behavior of complex systems, form hypotheses,¡¡and test predictions. The single most important skill for a computer scientist is problem solving. Problem solving means the ability to formulate problems, think creatively about¡¡solutions, and express a solution clearly and accurately. As it turns out, the¡¡process of learning to program is an excellent opportunity to practice problemsolving¡¡skills. That¡Çs why this chapter is called, ¡ÈThe way of the program.¡É On one level, you will be learning to program, a useful skill by itself. On another¡¡level, you will use programming as a means to an end. As we go along, that end¡¡will become clearer.
1.1 The Python programming language
The programming language you will be learning is Python. Python is an example¡¡of a high-level language; other high-level languages you might have heard of¡¡are C, C++, Perl, and Java. As you might infer from the name ¡Èhigh-level language,¡É there are also lowlevel¡¡£ìanguages, sometimes referred to as ¡Èmachine languages¡É or ¡Èassembly
2 The way of the program
languages.¡É Loosely speaking, computers can only execute programs written in¡¡low-level languages. Thus, programs written in a high-level language have to be¡¡processed before they can run. This extra processing takes some time, which is a¡¡small disadvantage of high-level languages. But the advantages are enormous. First, it is much easier to program in a highlevel¡¡language. Programs written in a high-level language take less time to write,¡¡they are shorter and easier to read, and they are more likely to be correct. Second,¡¡high-level languages are portable, meaning that they can run on different kinds¡¡of computers with few or no modifications. Low-level programs can run on only¡¡one kind of computer and have to be rewritten to run on another. Due to these advantages, almost all programs are written in high-level languages. Low-level languages are used only for a few specialized applications. Two kinds of programs process high-level languages into low-level languages: interpreters¡¡and compilers. An interpreter reads a high-level program and executes¡¡it, meaning that it does what the program says. It processes the program a¡¡little at a time, alternately reading lines and performing computations. SOURCE OUTPUT CODE INTERPRETER A compiler reads the program and translates it completely before the program¡¡starts running. In this case, the high-level program is called the source code,¡¡and the translated program is called the object code or the executable. Once¡¡a program is compiled, you can execute it repeatedly without further translation. OUTPUT CODE OBJECT EXECUTOR CODE SOURCE COMPILER Python is considered an interpreted language because Python programs are executed¡¡by an interpreter. There are two ways to use the interpreter: command-line¡¡mode and script mode. In command-line mode, you type Python programs and¡¡the interpreter prints the result:
1.2 What is a program? 3
$ python Python 2.4.1 (#1, Apr 29 2005, 00:28:56) Type "help", "copyright", "credits" or "license" for more information. >>> print 1 + 1 2 The first line of this example is the command that starts the Python interpreter. The next two lines are messages from the interpreter. The third line starts with¡¡>>>, which is the prompt the interpreter uses to indicate that it is ready. We¡¡typed print 1 + 1, and the interpreter replied 2. Alternatively, you can write a program in a file and use the interpreter to execute¡¡the contents of the file. Such a file is called a script. For example, we used a text¡¡editor to create a file named latoya.py with the following contents: print 1 + 1 By convention, files that contain Python programs have names that end with .py. To execute the program, we have to tell the interpreter the name of the script: $ python latoya.py 2 In other development environments, the details of executing programs may differ. Also, most programs are more interesting than this one. Most of the examples in this book are executed on the command line. Working¡¡on the command line is convenient for program development and testing, because¡¡you can type programs and execute them immediately. Once you have a working¡¡program, you should store it in a script so you can execute or modify it in the¡¡future.
1.2 What is a program?
A program is a sequence of instructions that specifies how to perform a computation. The computation might be something mathematical, such as solving a system of equations or finding the roots of a polynomial, but it can also be¡¡a symbolic computation, such as searching and replacing text in a document or¡¡(strangely enough) compiling a program. The details look different in different languages, but a few basic instructions appear¡¡in just about every language: input: Get data from the keyboard, a file, or some other device. 4 The way of the program output: Display data on the screen or send data to a file or other device. math: Perform basic mathematical operations like addition and multiplication. conditional execution: Check for certain conditions and execute the appropriate¡¡sequence of statements. repetition: Perform some action repeatedly, usually with some variation. Believe it or not, that¡Çs pretty much all there is to it. Every program you¡Çve ever¡¡used, no matter how complicated, is made up of instructions that look more or¡¡less like these. Thus, we can describe programming as the process of breaking a¡¡large, complex task into smaller and smaller subtasks until the subtasks are simple¡¡enough to be performed with one of these basic instructions. That may be a little vague, but we will come back to this topic later when we talk¡¡about algorithms.
1.3 What is debugging?
Programming is a complex process, and because it is done by human beings, it¡¡often leads to errors. For whimsical reasons, programming errors are called bugs¡¡and the process of tracking them down and correcting them is called debugging. Three kinds of errors can occur in a program: syntax errors, runtime errors, and¡¡semantic errors. It is useful to distinguish between them in order to track them¡¡down more quickly.
1.3.1 Syntax errors
Python can only execute a program if the program is syntactically correct; otherwise,¡¡the process fails and returns an error message. Syntax refers to the¡¡structure of a program and the rules about that structure. For example, in English,¡¡a sentence must begin with a capital letter and end with a period. this¡¡sentence contains a syntax error. So does this one¡¡For most readers, a few syntax errors are not a significant problem, which is why¡¡we can read the poetry of e. e. cummings without spewing error messages. Python¡¡is not so forgiving. If there is a single syntax error anywhere in your program, Python will print an error message and quit, and you will not be able to run¡¡your program. During the first few weeks of your programming career, you will¡¡probably spend a lot of time tracking down syntax errors. As you gain experience,¡¡though, you will make fewer errors and find them faster.
1.3 What is debugging? 5
1.3.2 Runtime errors
The second type of error is a runtime error, so called because the error does¡¡not appear until you run the program. These errors are also called exceptions¡¡because they usually indicate that something exceptional (and bad) has happened. Runtime errors are rare in the simple programs you will see in the first few chapters,¡¡so it might be a while before you encounter one.
1.3.3 Semantic errors
The third type of error is the semantic error. If there is a semantic error in your¡¡program, it will run successfully, in the sense that the computer will not generate¡¡any error messages, but it will not do the right thing. It will do something else. Specifically, it will do what you told it to do. The problem is that the program you wrote is not the program you wanted to¡¡write. The meaning of the program (its semantics) is wrong. Identifying semantic¡¡errors can be tricky because it requires you to work backward by looking at the¡¡output of the program and trying to figure out what it is doing.
1.3.4 Experimental debugging
One of the most important skills you will acquire is debugging. Although it can¡¡be frustrating, debugging is one of the most intellectually rich, challenging, and¡¡interesting parts of programming. In some ways, debugging is like detective work. You are confronted with clues,¡¡and you have to infer the processes and events that led to the results you see. Debugging is also like an experimental science. Once you have an idea what is¡¡going wrong, you modify your program and try again. If your hypothesis was¡¡correct, then you can predict the result of the modification, and you take a step¡¡closer to a working program. If your hypothesis was wrong, you have to come up¡¡with a new one. As Sherlock Holmes pointed out, ¡ÈWhen you have eliminated¡¡the impossible, whatever remains, however improbable, must be the truth.¡É (A¡¡Conan Doyle, The Sign of Four) For some people, programming and debugging are the same thing. That is, programming¡¡is the process of gradually debugging a program until it does what you¡¡want. The idea is that you should start with a program that does something and¡¡make small modifications, debugging them as you go, so that you always have a¡¡working program. 6 The way of the program For example, Linux is an operating system that contains thousands of lines of¡¡code, but it started out as a simple program Linus Torvalds used to explore the¡¡Intel 80386 chip. According to Larry Greenfield, ¡ÈOne of Linus¡Çs earlier projects¡¡was a program that would switch between printing AAAA and BBBB. This later¡¡evolved to Linux.¡É (The Linux Users¡Ç Guide Beta Version 1) Later chapters will make more suggestions about debugging and other programming¡¡practices.
1.4 Formal and natural languages
Natural languages are the languages that people speak, such as English, Spanish,¡¡and French. They were not designed by people (although people try to impose¡¡some order on them); they evolved naturally. Formal languages are languages that are designed by people for specific applications. For example, the notation that mathematicians use is a formal language that is particularly good at denoting relationships among numbers and symbols. Chemists use a formal language to represent the chemical structure of molecules. And most importantly: Programming languages are formal languages that have been designed to express computations. Formal languages tend to have strict rules about syntax. For example, 3 + 3 = 6¡¡is a syntactically correct mathematical statement, but 3=+6$ is not. H2O is a¡¡syntactically correct chemical name, but 2Zz is not. Syntax rules come in two flavors, pertaining to tokens and structure. Tokens are¡¡the basic elements of the language, such as words, numbers, and chemical elements. One of the problems with 3=+6$ is that $ is not a legal token in mathematics (atleast as far as we know). Similarly, 2Zz is not legal because there is no element¡¡with the abbreviation Zz. The second type of syntax error pertains to the structure of a statement—that¡¡is, the way the tokens are arranged. The statement 3=+6$ is structurally illegal¡¡because you can¡Çt place a plus sign immediately after an equal sign. Similarly,¡¡molecular formulas have to have subscripts after the element name, not before¡¡As an exercise, create what appears to be a well-structured English¡¡sentence with unrecognizable tokens in it. Then write another sentence with all valid tokens but with invalid structure.
1.4 Formal and natural languages 7
When you read a sentence in English or a statement in a formal language, you¡¡have to figure out what the structure of the sentence is (although in a natural¡¡language you do this subconsciously). This process is called parsing. For example, when you hear the sentence, ¡ÈThe other shoe fell,¡É you understand¡¡that ¡Èthe other shoe¡É is the subject and ¡Èfell¡É is the predicate. Once you have¡¡parsed a sentence, you can figure out what it means, or the semantics of the¡¡sentence. Assuming that you know what a shoe is and what it means to fall, you¡¡will understand the general implication of this sentence. Although formal and natural languages have many features in common—tokens,¡¡structure, syntax, and semantics—there are many differences: ambiguity: Natural languages are full of ambiguity, which people deal with by¡¡using contextual clues and other information. Formal languages are designed¡¡to be nearly or completely unambiguous, which means that any statement¡¡has exactly one meaning, regardless of context. redundancy: In order to make up for ambiguity and reduce misunderstandings,¡¡natural languages employ lots of redundancy. As a result, they are often¡¡verbose. Formal languages are less redundant and more concise. literalness: Natural languages are full of idiom and metaphor. If I say, ¡ÈThe¡¡other shoe fell,¡É there is probably no shoe and nothing falling. Formal¡¡languages mean exactly what they say. People who grow up speaking a natural language—everyone—often have a hard¡¡time adjusting to formal languages. In some ways, the difference between formal¡¡and natural language is like the difference between poetry and prose, but more so: Poetry: Words are used for their sounds as well as for their meaning, and the¡¡whole poem together creates an effect or emotional response. Ambiguity is¡¡not only common but often deliberate. Prose: The literal meaning of words is more important, and the structure contributes¡¡more meaning. Prose is more amenable to analysis than poetry but¡¡still often ambiguous. Programs: The meaning of a computer program is unambiguous and literal, and¡¡can be understood entirely by analysis of the tokens and structure. Here are some suggestions for reading programs (and other formal languages). First, remember that formal languages are much more dense than natural languages,¡¡so it takes longer to read them. Also, the structure is very important, so¡¡it is usually not a good idea to read from top to bottom, left to right. Instead,
8 The way of the program
learn to parse the program in your head, identifying the tokens and interpreting¡¡the structure. Finally, the details matter. Little things like spelling errors and¡¡bad punctuation, which you can get away with in natural languages, can make a¡¡big difference in a formal language.
1.5 The first program
Traditionally, the first program written in a new language is called ¡ÈHello, World!¡É because all it does is display the words, ¡ÈHello, World!¡É In Python, it looks like this: print "Hello, World!" This is an example of a print statement, which doesn¡Çt actually print anything on paper. It displays a value on the screen. In this case, the result is the words Hello, World! The quotation marks in the program mark the beginning and end of the value; they don¡Çt appear in the result. Some people judge the quality of a programming language by the simplicity of¡¡the ¡ÈHello, World!¡É program. By this standard, Python does about as well as¡¡possible.
1.6 Glossary
problem solving: The process of formulating a problem, finding a solution, and¡¡expressing the solution. high-level language: A programming language like Python that is designed to¡¡be easy for humans to read and write¡¡low-level language: A programming language that is designed to be easy for¡¡a computer to execute; also called ¡Èmachine language¡É or ¡Èassembly language.¡É portability: A property of a program that can run on more than one kind of¡¡computer. interpret: To execute a program in a high-level language by translating it one¡¡line at a time. compile: To translate a program written in a high-level language into a low-level¡¡language all at once, in preparation for later execution.
1.6 Glossary 9
source code: A program in a high-level language before being compiled. object code: The output of the compiler after it translates the program. executable: Another name for object code that is ready to be executed. script: A program stored in a file (usually one that will be interpreted). program: A set of instructions that specifies a computation. algorithm: A general process for solving a category of problems. bug: An error in a program. debugging: The process of finding and removing any of the three kinds of programming¡¡errors. syntax: The structure of a program. syntax error: An error in a program that makes it impossible to parse (and¡¡therefore impossible to interpret). runtime error: An error that does not occur until the program has started to¡¡execute but that prevents the program from continuing. exception: Another name for a runtime error. semantic error: An error in a program that makes it do something other than¡¡what the programmer intended. semantics: The meaning of a program. natural language: Any one of the languages that people speak that evolved naturally. formal language: Any one of the languages that people have designed for specific purposes, such as representing mathematical ideas or computer programs; all programming languages are formal languages. token: One of the basic elements of the syntactic structure of a program, analogous to a word in a natural language. parse: To examine a program and analyze the syntactic structure. print statement: An instruction that causes the Python interpreter to display a value on the screen.
10 The way of the program
Chapter 2 Variables, expressions and statements
2.1 Values and types
A value is one of the fundamental things—like a letter or a number—that a program manipulates. The values we have seen so far are 2 (the result when we added 1 + 1), and ¡ÇHello, World!¡Ç These values belong to different types: 2 is an integer, and ¡ÇHello, World!¡Çis a string, so-called because it contains a ¡Èstring¡É of letters. You (and the interpreter) can identify strings because they are enclosed in quotation marks. The print statement also works for integers. >>> print 4 4 If you are not sure what type a value has, the interpreter can tell you. >>> type(¡ÇHello, World!¡Ç) <type ¡Çstr¡Ç> >>> type(17) <type ¡Çint¡Ç> Not surprisingly, strings belong to the type str and integers belong to the type int. Less obviously, numbers with a decimal point belong to a type called float,because these numbers are represented in a format called floating-point.
12 Variables, expressions and statements
>>> type(3.2) <type ¡Çfloat¡Ç> What about values like ¡Ç17¡Ç and ¡Ç3.2¡Ç? They look like numbers, but they are in quotation marks like strings. >>> type(¡Ç17¡Ç) <type ¡Çstr¡Ç> >>> type(¡Ç3.2¡Ç) <type ¡Çstr¡Ç> They¡Çre strings. When you type a large integer, you might be tempted to use commas between groups of three digits, as in 1,000,000. This is not a legal integer in Python, but it is a legal expression: >>> print 1,000,000 1 0 0 Well, that¡Çs not what we expected at all! Python interprets 1,000,000 as a comma-separated list of three integers, which it prints consecutively. This is themfirst example we have seen of a semantic error: the code runs without producing an error message, but it doesn¡Çt do the ¡Èright¡É thing.
2.2 Variables
One of the most powerful features of a programming language is the ability to manipulate variables. A variable is a name that refers to a value. The assignment statement creates new variables and gives them values: >>> message = "What¡Çs up, Doc?" >>> n = 17 >>> pi = 3.14159 This example makes three assignments. The first assigns the string "What¡Çs up,Doc?" to a new variable named message. The second gives the integer 17 to n,and the third gives the floating-point number 3.14159 to pi. Notice that the first statement uses double quotes to enclose the string. In general,single and double quotes do the same thing, but if the string contains a single quote (or an apostrophe, which is the same character), you have to use double quotes to enclose it.
2.3 Variable names and keywords 13
A common way to represent variables on paper is to write the name with an arrow pointing to the variable¡Çs value. This kind of figure is called a state diagram because it shows what state each of the variables is in (think of it as the variable¡Çs state of mind). This diagram shows the result of the assignment statements: message n pi "What¡Çs up, Doc?" 17 3.14159 The print statement also works with variables. >>> print message What¡Çs up, Doc? >>> print n 17 >>> print pi 3.14159 In each case the result is the value of the variable. Variables also have types; again, we can ask the interpreter what they are. >>> type(message) <type ¡Çstr¡Ç> >>> type(n) <type ¡Çint¡Ç> >>> type(pi) <type ¡Çfloat¡Ç> The type of a variable is the type of the value it refers to.
2.3 Variable names and keywords
Programmers generally choose names for their variables that are meaningful—they document what the variable is used for. Variable names can be arbitrarily long. They can contain both letters and numbers,but they have to begin with a letter. Although it is legal to use uppercase letters, by convention we don¡Çt. If you do, remember that case matters. Bruce and bruce are different variables. The underscore character ( ) can appear in a name. It is often used in names with multiple words, such as my name or price of tea in china.
14 Variables, expressions and statements
If you give a variable an illegal name, you get a syntax error:
2.4 Statements 15
>>> 76trombones = ¡Çbig parade¡Ç SyntaxError: invalid syntax >>> more$ = 1000000 SyntaxError: invalid syntax >>> class = ¡ÇComputer Science 101¡Ç SyntaxError: invalid syntax 76trombones is illegal because it does not begin with a letter. more$ is illegal because it contains an illegal character, the dollar sign. But what¡Çs wrong with class? It turns out that class is one of the Python keywords. Keywords define the language¡Çs rules and structure, and they cannot be used as variable names. Python has twenty-nine keywords: and def exec if not return assert del finally import or try break elif for in pass while class else from is print yield continue except global lambda raise You might want to keep this list handy. If the interpreter complains about one of your variable names and you don¡Çt know why, see if it is on this list.
2.4 Statements
A statement is an instruction that the Python interpreter can execute. We have seen two kinds of statements: print and assignment. When you type a statement on the command line, Python executes it and displays the result, if there is one. The result of a print statement is a value. Assignment statements don¡Çt produce a result. A script usually contains a sequence of statements. If there is more than one statement, the results appear one at a time as the statements execute. For example, the script print 1 x = 2 print x produces the output 16 Variables, expressions and statements 1 2 Again, the assignment statement produces no output.
2.5 Evaluating expressions
An expression is a combination of values, variables, and operators. If you type an expression on the command line, the interpreter evaluates it and displays the result: >>> 1 + 1 2 Although expressions contain values, variables, and operators, not every expression contains all of these elements. A value all by itself is considered an expression,and so is a variable. >>> 17 17 >>> x 2 Confusingly, evaluating an expression is not quite the same thing as printing a value. >>> message = ¡ÇHello, World!¡Ç >>> message ¡ÇHello, World!¡Ç >>> print message Hello, World! When the Python interpreter displays the value of an expression, it uses the same format you would use to enter a value. In the case of strings, that means that it includes the quotation marks. But if you use a print statement, Python displays the contents of the string without the quotation marks. In a script, an expression all by itself is a legal statement, but it doesn¡Çt do anything. The script 17 3.2 ¡ÇHello, World!¡Ç 1 + 1 produces no output at all. How would you change the script to display the values of these four expressions?
2.6 Operators and operands 17
2.6 Operators and operands
Operators are special symbols that represent computations like addition and multiplication. The values the operator uses are called operands. The following are all legal Python expressions whose meaning is more or less clear: 20+32 hour-1 hour*60+minute minute/60 5**2 (5+9)*(15-7) The symbols +, -, and /, and the use of parenthesis for grouping, mean in Python what they mean in mathematics. The asterisk (*) is the symbol for multiplication,and ** is the symbol for exponentiation. When a variable name appears in the place of an operand, it is replaced with its value before the operation is performed. Addition, subtraction, multiplication, and exponentiation all do what you expect,but you might be surprised by division. The following operation has an unexpected result: >>> minute = 59 >>> minute/60 0 The value of minute is 59, and in conventional arithmetic 59 divided by 60 is 0.98333, not 0. The reason for the discrepancy is that Python is performing integer division. When both of the operands are integers, the result must also be an integer, and by convention, integer division always rounds down, even in cases like this where the next integer is very close. A possible solution to this problem is to calculate a percentage rather than a fraction: >>> minute*100/60 98 Again the result is rounded down, but at least now the answer is approximately correct. Another alternative is to use floating-point division, which we get to in
Chapter 3.
2.7 Order of operations
When more than one operator appears in an expression, the order of evaluation depends on the rules of precedence. Python follows the same precedence rules for its mathematical operators that athematics does. The acronym PEMDAS is a useful way to remember the order of operations:
18 Variables, expressions and statements
• Parentheses have the highest precedence and can be used to force an expression to evaluate in the order you want. Since expressions in parentheses are evaluated first, 2 * (3-1) is 4, and (1+1)**(5-2) is 8. You can also use parentheses to make an expression easier to read, as in (minute * 100) 60, even though it doesn¡Çt change the result. • Exponentiation has the next highest precedence, so 2**1+1 is 3 and not 4,and 3*1**3 is 3 and not 27. • Multiplication and Division have the same precedence, which is higher than Addition and Subtraction, which also have the same precedence. So 2*3-1yields 5 rather than 4, and 2/3-1 is -1, not 1 (remember that in integer division, 2/3=0). • Operators with the same precedence are evaluated from left to right. So in the expression minute*100/60, the multiplication happens first, yielding 5900/60, which in turn yields 98. If the operations had been evaluated from right to left, the result would have been 59*1, which is 59, which is wrong.
2.8 Operations on strings
In general, you cannot perform mathematical operations on strings, even if the strings look like numbers. The following are illegal (assuming that message has type string): message-1 ¡ÇHello¡Ç/123 message*¡ÇHello¡Ç ¡Ç15¡Ç+2 Interestingly, the + operator does work with strings, although it does not do exactly what you might expect. For strings, the + operator represents concatenation, which means joining the two operands by linking them end-to-end. For example: fruit = ¡Çbanana¡Ç bakedGood = ¡Ç nut bread¡Ç print fruit + bakedGood The output of this program is banana nut bread. The space before the word nut is part of the string, and is necessary to produce the space between the concatenated strings. The * operator also works on strings; it performs repetition. For example, ¡ÇFun¡Ç*3 is ¡ÇFunFunFun¡Ç. One of the operands has to be a string; the other has to be an integer.
2.9 Composition 19
On one hand, this interpretation of + and * makes sense by analogy with addition and multiplication. Just as 4*3 is equivalent to 4+4+4, we expect ¡ÇFun¡Ç*3 to be the same as ¡ÇFun¡Ç+¡ÇFun¡Ç+¡ÇFun¡Ç, and it is. On the other hand, there is a significant way in which string concatenation and repetition are different from integer addition and multiplication. Can you think of a property that addition and multiplication have that string concatenation and repetition do not?
2.9 Composition
So far, we have looked at the elements of a program—variables, expressions, and statements—in isolation, without talking about how to combine them. One of the most useful features of programming languages is their ability to take small building blocks and compose them. For example, we know how to add numbers and we know how to print; it turns out we can do both at the same time: >>> print 17 + 3 20 In reality, the addition has to happen before the printing, so the actions aren¡Çt actually happening at the same time. The point is that any expression involving numbers, strings, and variables can be used inside a print statement. You¡Çve already seen an example of this: print ¡ÇNumber of minutes since midnight: ¡Ç, hour*60+minute You can also put arbitrary expressions on the right-hand side of an assignment statement: percentage = (minute * 100) / 60 This ability may not seem impressive now, but you will see other examples where composition makes it possible to express complex computations neatly and concisely Warning: There are limits on where you can use certain expressions. For example,the left-hand side of an assignment statement has to be a variable name, not an expression. So, the following is illegal: minute+1 = hour.
2.10 Comments
As programs get bigger and more complicated, they get more difficult to read. Formal languages are dense, and it is often difficult to look at a piece of code and figure out what it is doing, or why.20 Variables, expressions and statements For this reason, it is a good idea to add notes to your programs to explain in natural language what the program is doing. These notes are called comments,and they are marked with the # symbol: # compute the percentage of the hour that has elapsed percentage = (minute * 100) / 60 In this case, the comment appears on a line by itself. You can also put comments at the end of a line: percentage = (minute * 100) / 60 # caution: integer division Everything from the # to the end of the line is ignored—it has no effect on the program. The message is intended for the programmer or for future programmers who might use this code. In this case, it reminds the reader about the eversurprising behavior of integer division. This sort of comment is less necessary if you use the integer division operation,//. It has the same effect as the division operator1, but it signals that the effect is deliberate. percentage = (minute * 100) // 60 The integer division operator is like a comment that says, ¡ÈI know this is integer division, and I like it that way!¡É
2.11 Glossary
value: A number or string (or other thing to be named later) that can be stored in a variable or computed in an expression. type: A set of values. The type of a value determines how it can be used in expressions. So far, the types you have seen are integers (type int), floatingpoint numbers (type float), and strings (type string). floating-point: A format for representing numbers with fractional parts. variable: A name that refers to a value. statement: A section of code that represents a command or action. So far, the statements you have seen are assignments and print statements. assignment: A statement that assigns a value to a variable. 1For now. The behavior of the division operator may change in future versions of Python. 2.11 Glossary 21 state diagram: A graphical representation of a set of variables and the values to which they refer keyword: A reserved word that is used by the compiler to parse a program; you cannot use keywords like if, def, and while as variable names. operator: A special symbol that represents a simple computation like addition,multiplication, or string concatenation. operand: One of the values on which an operator operates. expression: A combination of variables, operators, and values that represents a single result value. evaluate: To simplify an expression by performing the operations in order to yield a single value. integer division: An operation that divides one integer by another and yields an integer. Integer division yields only the whole number of times that the numerator is divisible by the denominator and discards any remainder. rules of precedence: The set of rules governing the order in which expressions involving multiple operators and operands are evaluated. concatenate: To join two operands end-to-end. composition: The ability to combine simple expressions and statements into compound statements and expressions in order to represent complex computations concisely. comment: Information in a program that is meant for other programmers (or anyone reading the source code) and has no effect on the execution of the program. 22 Variables, expressions and statements Chapter 3 Functions 3.1 Function calls You have already seen one example of a function call: >>> type("32") <type ¡Çstr¡Ç> The name of the function is type, and it displays the type of a value or variable. The value or variable, which is called the argument of the function, has to be enclosed in parentheses. It is common to say that a function ¡Ètakes¡É an argument and ¡Èreturns¡É a result. The result is called the return value. Instead of printing the return value, we could assign it to a variable: >>> betty = type("32") >>> print betty <type ¡Çstr¡Ç> As another example, the id function takes a value or a variable and returns an integer that acts as a unique identifier for the value: >>> id(3) 134882108 >>> betty = 3 >>> id(betty) 134882108 Every value has an id, which is a unique number related to where it is stored in the memory of the computer. The id of a variable is the id of the value to which it refers. 24 Functions 3.2 Type conversion Python provides a collection of built-in functions that convert values from one type to another. The int function takes any value and converts it to an integer, if possible, or complains otherwise: >>> int("32") 32 >>> int("Hello") ValueError: invalid literal for int(): Hello int can also convert floating-point values to integers, but remember that it truncates the fractional part: >>> int(3.99999) 3 >>> int(-2.3) -2 The float function converts integers and strings to floating-point numbers: >>> float(32) 32.0 >>> float("3.14159") 3.14159 Finally, the str function converts to type string: >>> str(32) ¡Ç32¡Ç >>> str(3.14149) ¡Ç3.14149¡Ç It may seem odd that Python distinguishes the integer value 1 from the floatingpoint value 1.0. They may represent the same number, but they belong to different types. The reason is that they are represented differently inside the computer. 3.3 Type coercion Now that we can convert between types, we have another way to deal with integer division. Returning to the example from the previous chapter, suppose we want to calculate the fraction of an hour that has elapsed. The most obvious expression, minute / 60, does integer arithmetic, so the result is always 0, even at 59 minutes past the hour. One solution is to convert minute to floating-point and do floating-point division: 3.4 Math functions 25 >>> minute = 59 >>> float(minute) / 60 0.983333333333 Alternatively, we can take advantage of the rules for automatic type conversion, which is called type coercion. For the mathematical operators, if either operand is a float, the other is automatically converted to a float: >>> minute = 59 >>> minute / 60.0 0.983333333333 By making the denominator a float, we force Python to do floating-point division. 3.4 Math functions In mathematics, you have probably seen functions like sin and log, and you have learned to evaluate expressions like sin(pi/2) and log(1/x). First, you evaluate the expression in parentheses (the argument). For example, pi/2 is approximately 1.571, and 1/x is 0.1 (if x happens to be 10.0). Then, you evaluate the function itself, either by looking it up in a table or by performing various computations. The sin of 1.571 is 1, and the log of 0.1 is -1 (assuming that log indicates the logarithm base 10). This process can be applied repeatedly to evaluate more complicated expressions like log(1/sin(pi/2)). First, you evaluate the argument of the innermost function, then evaluate the function, and so on. Python has a math module that provides most of the familiar mathematical functions. A module is a file that contains a collection of related functions grouped together. Before we can use the functions from a module, we have to import them: >>> import math To call one of the functions, we have to specify the name of the module and the name of the function, separated by a dot, also known as a period. This format is called dot notation. >>> decibel = math.log10 (17.0) >>> angle = 1.5 >>> height = math.sin(angle) 26 Functions The first statement sets decibel to the logarithm of 17, base 10. There is also a function called log that takes logarithm base e. The third statement finds the sine of the value of the variable angle. sin and the other trigonometric functions (cos, tan, etc.) take arguments in radians. To convert from degrees to radians, divide by 360 and multiply by 2*pi. For example, to find the sine of 45 degrees, first calculate the angle in radians and then take the sine: >>> degrees = 45 >>> angle = degrees * 2 * math.pi / 360.0 >>> math.sin(angle) 0.707106781187 The constant pi is also part of the math module. If you know your geometry, you can check the previous result by comparing it to the square root of two divided by two: >>> math.sqrt(2) / 2.0 0.707106781187 3.5 Composition Just as with mathematical functions, Python functions can be composed, meaning that you use one expression as part of another. For example, you can use any expression as an argument to a function: >>> x = math.cos(angle + math.pi/2) This statement takes the value of pi, divides it by 2, and adds the result to the value of angle. The sum is then passed as an argument to the cos function. You can also take the result of one function and pass it as an argument to another: >>> x = math.exp(math.log(10.0)) This statement finds the log base e of 10 and then raises e to that power. The result gets assigned to x. 3.6 Adding new functions So far, we have only been using the functions that come with Python, but it is also possible to add new functions. Creating new functions to solve your particular 3.6 Adding new functions 27 problems is one of the most useful things about a general-purpose programming language. In the context of programming, a function is a named sequence of statements that performs a desired operation. This operation is specified in a function definition. The functions we have been using so far have been defined for us, and these definitions have been hidden. This is a good thing, because it allows us to use the functions without worrying about the details of their definitions. The syntax for a function definition is: def NAME( LIST OF PARAMETERS ): STATEMENTS You can make up any names you want for the functions you create, except that you can¡Çt use a name that is a Python keyword. The list of parameters specifies what information, if any, you have to provide in order to use the new function. There can be any number of statements inside the function, but they have to be indented from the left margin. In the examples in this book, we will use an indentation of two spaces. The first couple of functions we are going to write have no parameters, so the syntax looks like this: def newLine(): print This function is named newLine. The empty parentheses indicate that it has no parameters. It contains only a single statement, which outputs a newline character. (That¡Çs what happens when you use a print command without any arguments.) The syntax for calling the new function is the same as the syntax for built-in functions: print "First Line." newLine() print "Second Line." The output of this program is: First line. Second line. Notice the extra space between the two lines. What if we wanted more space between the lines? We could call the same function repeatedly: 28 Functions print "First Line." newLine() newLine() newLine() print "Second Line." Or we could write a new function named threeLines that prints three new lines: def threeLines(): newLine() newLine() newLine() print "First Line." threeLines() print "Second Line." This function contains three statements, all of which are indented by two spaces. Since the next statement is not indented, Python knows that it is not part of the function. You should notice a few things about this program: 1. You can call the same procedure repeatedly. In fact, it is quite common and useful to do so. 2. You can have one function call another function; in this case threeLines calls newLine. So far, it may not be clear why it is worth the trouble to create all of these new functions. Actually, there are a lot of reasons, but this example demonstrates two: • Creating a new function gives you an opportunity to name a group of statements. Functions can simplify a program by hiding a complex computation behind a single command and by using English words in place of arcane code. • Creating a new function can make a program smaller by eliminating repetitive code. For example, a short way to print nine consecutive new lines is to call threeLines three times. As an exercise, write a function called nineLines that uses threeLines to print nine blank lines. How would you print twentyseven new lines? 3.7 Definitions and use 29 3.7 Definitions and use Pulling together the code fragments from Section 3.6, the whole program looks like this: def newLine(): print def threeLines(): newLine() newLine() newLine() print "First Line." threeLines() print "Second Line." This program contains two function definitions: newLine and threeLines. Function definitions get executed just like other statements, but the effect is to create the new function. The statements inside the function do not get executed until the function is called, and the function definition generates no output. As you might expect, you have to create a function before you can execute it. In other words, the function definition has to be executed before the first time it is called. As an exercise, move the last three lines of this program to the top, so the function calls appear before the definitions. Run the program and see what error message you get. As another exercise, start with the working version of the program and move the definition of newLine after the definition of threeLines. What happens when you run this program? 3.8 Flow of execution In order to ensure that a function is defined before its first use, you have to know the order in which statements are executed, which is called the flow of execution. Execution always begins at the first statement of the program. Statements are executed one at a time, in order from top to bottom. Function definitions do not alter the flow of execution of the program, but remember that statements inside the function are not executed until the function 30 Functions is called. Although it is not common, you can define one function inside another. In this case, the inner definition isn¡Çt executed until the outer function is called. Function calls are like a detour in the flow of execution. Instead of going to the next statement, the flow jumps to the first line of the called function, executes all the statements there, and then comes back to pick up where it left off. That sounds simple enough, until you remember that one function can call another. While in the middle of one function, the program might have to execute the statements in another function. But while executing that new function, the program might have to execute yet another function! Fortunately, Python is adept at keeping track of where it is, so each time a function completes, the program picks up where it left off in the function that called it. When it gets to the end of the program, it terminates. What¡Çs the moral of this sordid tale? When you read a program, don¡Çt read from top to bottom. Instead, follow the flow of execution. 3.9 Parameters and arguments Some of the built-in functions you have used require arguments, the values that control how the function does its job. For example, if you want to find the sine of a number, you have to indicate what the number is. Thus, sin takes a numeric value as an argument. Some functions take more than one argument. For example, pow takes two arguments, the base and the exponent. Inside the function, the values that are passed get assigned to variables called parameters. Here is an example of a user-defined function that has a parameter: def printTwice(bruce): print bruce, bruce This function takes a single argument and assigns it to a parameter named bruce. The value of the parameter (at this point we have no idea what it will be) is printed twice, followed by a newline. The name bruce was chosen to suggest that the name you give a parameter is up to you, but in general, you want to choose something more illustrative than bruce. The function printTwice works for any type that can be printed: >>> printTwice(¡ÇSpam¡Ç) Spam Spam 3.10 Variables and parameters are local 31 >>> printTwice(5) 5 5 >>> printTwice(3.14159) 3.14159 3.14159 In the first function call, the argument is a string. In the second, it¡Çs an integer. In the third, it¡Çs a float. The same rules of composition that apply to built-in functions also apply to user-defined functions, so we can use any kind of expression as an argument for printTwice: >>> printTwice(¡ÇSpam¡Ç*4) SpamSpamSpamSpam SpamSpamSpamSpam >>> printTwice(math.cos(math.pi)) -1.0 -1.0 As usual, the expression is evaluated before the function is run, so printTwice prints SpamSpamSpamSpam SpamSpamSpamSpam instead of ¡ÇSpam¡Ç*4 ¡ÇSpam¡Ç*4. As an exercise, write a call to printTwice that does print ¡ÇSpam¡Ç*4 ¡ÇSpam¡Ç*4. Hint: strings can be enclosed in either single or double quotes, and the type of quote not used to enclose the string can be used inside it as part of the string. We can also use a variable as an argument: >>> michael = ¡ÇEric, the half a bee.¡Ç >>> printTwice(michael) Eric, the half a bee. Eric, the half a bee. Notice something very important here. The name of the variable we pass as an argument (michael) has nothing to do with the name of the parameter (bruce). It doesn¡Çt matter what the value was called back home (in the caller); here in printTwice, we call everybody bruce. 3.10 Variables and parameters are local When you create a local variable inside a function, it only exists inside the function, and you cannot use it outside. For example: def catTwice(part1, part2): cat = part1 + part2 printTwice(cat) 32 Functions This function takes two arguments, concatenates them, and then prints the result twice. We can call the function with two strings: >>> chant1 = "Pie Jesu domine, " >>> chant2 = "Dona eis requiem." >>> catTwice(chant1, chant2) Pie Jesu domine, Dona eis requiem. Pie Jesu domine, Dona eis requiem. When catTwice terminates, the variable cat is destroyed. If we try to print it, we get an error: >>> print cat NameError: cat Parameters are also local. For example, outside the function printTwice, there is no such thing as bruce. If you try to use it, Python will complain. 3.11 Stack diagrams To keep track of which variables can be used where, it is sometimes useful to draw a stack diagram. Like state diagrams, stack diagrams show the value of each variable, but they also show the function to which each variable belongs. Each function is represented by a frame. A frame is a box with the name of a function beside it and the parameters and variables of the function inside it. The stack diagram for the previous example looks like this: catTwice chant1 chant2 "Pie Jesu domine," "Dona eis requiem." __main__ printTwice part1 part2 cat "Pie Jesu domine," "Pie Jesu domine, Dona eis requiem." "Dona eis requiem." bruce "Pie Jesu domine, Dona eis requiem." 3.12 Functions with results 33 The order of the stack shows the flow of execution. printTwice was called by catTwice, and catTwice was called by main , which is a special name for the topmost function. When you create a variable outside of any function, it belongs to main . Each parameter refers to the same value as its corresponding argument. So, part1 has the same value as chant1, part2 has the same value as chant2, and bruce has the same value as cat. If an error occurs during a function call, Python prints the name of the function, and the name of the function that called it, and the name of the function that called that, all the way back to main . For example, if we try to access cat from within printTwice, we get a NameError: Traceback (innermost last): File "test.py", line 13, in __main__ catTwice(chant1, chant2) File "test.py", line 5, in catTwice printTwice(cat) File "test.py", line 9, in printTwice print cat NameError: cat This list of functions is called a traceback. It tells you what program file the error occurred in, and what line, and what functions were executing at the time. It also shows the line of code that caused the error. Notice the similarity between the traceback and the stack diagram. It¡Çs not a coincidence. 3.12 Functions with results You might have noticed by now that some of the functions we are using, such as the math functions, yield results. Other functions, like newLine, perform an action but don¡Çt return a value. That raises some questions: 1. What happens if you call a function and you don¡Çt do anything with the result (i.e., you don¡Çt assign it to a variable or use it as part of a larger expression)? 2. What happens if you use a function without a result as part of an expression, such as newLine() + 7? 34 Functions 3. Can you write functions that yield results, or are you stuck with simple function like newLine and printTwice? The answer to the last question is that you can write functions that yield results, and we¡Çll do it in Chapter 5. As an exercise, answer the other two questions by trying them out. When you have a question about what is legal or illegal in Python, a good way to find out is to ask the interpreter. 3.13 Glossary function call: A statement that executes a function. It consists of the name of the function followed by a list of arguments enclosed in parentheses. argument: A value provided to a function when the function is called. This value is assigned to the corresponding parameter in the function. return value: The result of a function. If a function call is used as an expression, the return value is the value of the expression. type conversion: An explicit statement that takes a value of one type and computes a corresponding value of another type. type coercion: A type conversion that happens automatically according to Python¡Çs coercion rules. module: A file that contains a collection of related functions and classes. dot notation: The syntax for calling a function in another module, specifying the module name followed by a dot (period) and the function name. function: A named sequence of statements that performs some useful operation. Functions may or may not take arguments and may or may not produce a result. function definition: A statement that creates a new function, specifying its name, parameters, and the statements it executes. flow of execution: The order in which statements are executed during a program run. parameter: A name used inside a function to refer to the value passed as an argument. 3.13 Glossary 35 local variable: A variable defined inside a function. A local variable can only be used inside its function. stack diagram: A graphical representation of a stack of functions, their variables, and the values to which they refer. frame: A box in a stack diagram that represents a function call. It contains the local variables and parameters of the function. traceback: A list of the functions that are executing, printed when a runtime error occurs. 36 Functions Chapter 4 Conditionals and recursion 4.1 The modulus operator The modulus operator works on integers (and integer expressions) and yields the remainder when the first operand is divided by the second. In Python, the modulus operator is a percent sign (%). The syntax is the same as for other operators: >>> quotient = 7 / 3 >>> print quotient 2 >>> remainder = 7 % 3 >>> print remainder 1 So 7 divided by 3 is 2 with 1 left over. The modulus operator turns out to be surprisingly useful. For example, you can check whether one number is divisible by another—if x % y is zero, then x is divisible by y. Also, you can extract the right-most digit or digits from a number. For example, x % 10 yields the right-most digit of x (in base 10). Similarly x % 100 yields the last two digits. 4.2 Boolean expressions A boolean expression is an expression that is either true or false. One way to write a boolean expression is to use the operator ==, which compares two values and produces a boolean value: 38 Conditionals and recursion >>> 5 == 5 True >>> 5 == 6 False In the first statement, the two operands are equal, so the value of the expression is True; in the second statement, 5 is not equal to 6, so we get False. True and False are special values that are built into Python. The == operator is one of the comparison operators; the others are: x != y # x is not equal to y x > y # x is greater than y x < y # x is less than y x >= y # x is greater than or equal to y x <= y # x is less than or equal to y Although these operations are probably familiar to you, the Python symbols are different from the mathematical symbols. A common error is to use a single equal sign (=) instead of a double equal sign (==). Remember that = is an assignment operator and == is a comparison operator. Also, there is no such thing as =< or =>. 4.3 Logical operators There are three logical operators: and, or, and not. The semantics (meaning) of these operators is similar to their meaning in English. For example, x > 0 and x < 10 is true only if x is greater than 0 and less than 10. n%2 == 0 or n%3 == 0 is true if either of the conditions is true, that is, if the number is divisible by 2 or 3. Finally, the not operator negates a boolean expression, so not(x > y) is true if (x > y) is false, that is, if x is less than or equal to y. Strictly speaking, the operands of the logical operators should be boolean expressions, but Python is not very strict. Any nonzero number is interpreted as ¡Ètrue.¡É >>> x = 5 >>> x and 1 1 >>> y = 0 >>> y and 1 0 4.4 Conditional execution 39 In general, this sort of thing is not considered good style. If you want to compare a value to zero, you should do it explicitly. 4.4 Conditional execution In order to write useful programs, we almost always need the ability to check conditions and change the behavior of the program accordingly. Conditional statements give us this ability. The simplest form is the if statement: if x > 0: print "x is positive" The boolean expression after the if statement is called the condition. If it is true, then the indented statement gets executed. If not, nothing happens. Like other compound statements, the if statement is made up of a header and a block of statements: HEADER: FIRST STATEMENT ... LAST STATEMENT The header begins on a new line and ends with a colon (:). The indented statements that follow are called a block. The first unindented statement marks the end of the block. A statement block inside a compound statement is called the body of the statement. There is no limit on the number of statements that can appear in the body of an if statement, but there has to be at least one. Occasionally, it is useful to have a body with no statements (usually as a place keeper for code you haven¡Çt written yet). In that case, you can use the pass statement, which does nothing. 4.5 Alternative execution A second form of the if statement is alternative execution, in which there are two possibilities and the condition determines which one gets executed. The syntax looks like this: if x%2 == 0: print x, "is even" else: print x, "is odd" 40 Conditionals and recursion If the remainder when x is divided by 2 is 0, then we know that x is even, and the program displays a message to that effect. If the condition is false, the second set of statements is executed. Since the condition must be true or false, exactly one of the alternatives will be executed. The alternatives are called branches, because they are branches in the flow of execution. As an aside, if you need to check the parity (evenness or oddness) of numbers often, you might ¡Èwrap¡É this code in a function: def printParity(x): if x%2 == 0: print x, "is even" else: print x, "is odd" For any value of x, printParity displays an appropriate message. When you call it, you can provide any integer expression as an argument. >>> printParity(17) 17 is odd >>> y = 17 >>> printParity(y+1) 18 is even 4.6 Chained conditionals Sometimes there are more than two possibilities and we need more than two branches. One way to express a computation like that is a chained conditional: if x < y: print x, "is less than", y elif x > y: print x, "is greater than", y else: print x, "and", y, "are equal" elif is an abbreviation of ¡Èelse if.¡É Again, exactly one branch will be executed. There is no limit of the number of elif statements, but the last branch has to be an else statement: if choice == ¡ÇA¡Ç: functionA() elif choice == ¡ÇB¡Ç: functionB() elif choice == ¡ÇC¡Ç: functionC() else: print "Invalid choice." 4.7 Nested conditionals 41 Each condition is checked in order. If the first is false, the next is checked, and so on. If one of them is true, the corresponding branch executes, and the statement ends. Even if more than one condition is true, only the first true branch executes. As an exercise, wrap these examples in functions called compare(x, y) and dispatch(choice). 4.7 Nested conditionals One conditional can also be nested within another. We could have written the trichotomy example as follows: if x == y: print x, "and", y, "are equal" else: if x < y: print x, "is less than", y else: print x, "is greater than", y The outer conditional contains two branches. The first branch contains a simple output statement. The second branch contains another if statement, which has two branches of its own. Those two branches are both output statements, although they could have been conditional statements as well. Although the indentation of the statements makes the structure apparent, nested conditionals become difficult to read very quickly. In general, it is a good idea to avoid them when you can. Logical operators often provide a way to simplify nested conditional statements. For example, we can rewrite the following code using a single conditional: if 0 < x: if x < 10: print "x is a positive single digit." The print statement is executed only if we make it past both the conditionals, so we can use the and operator: if 0 < x and x < 10: print "x is a positive single digit." These kinds of conditions are common, so Python provides an alternative syntax that is similar to mathematical notation: 42 Conditionals and recursion if 0 < x < 10: print "x is a positive single digit." This condition is semantically the same as the compound boolean expression and the nested conditional. 4.8 The return statement The return statement allows you to terminate the execution of a function before you reach the end. One reason to use it is if you detect an error condition: import math def printLogarithm(x): if x <= 0: print "Positive numbers only, please." return result = math.log(x) print "The log of x is", result The function printLogarithm has a parameter named x. The first thing it does is check whether x is less than or equal to 0, in which case it displays an error message and then uses return to exit the function. The flow of execution immediately returns to the caller, and the remaining lines of the function are not executed. Remember that to use a function from the math module, you have to import it. 4.9 Recursion We mentioned that it is legal for one function to call another, and you have seen several examples of that. We neglected to mention that it is also legal for a function to call itself. It may not be obvious why that is a good thing, but it turns out to be one of the most magical and interesting things a program can do. For example, look at the following function: def countdown(n): if n == 0: print "Blastoff!" else: print n countdown(n-1) 4.9 Recursion 43 countdown expects the parameter, n, to be a positive integer. If n is 0, it outputs the word, ¡ÈBlastoff!¡É Otherwise, it outputs n and then calls a function named countdown—itself—passing n-1 as an argument. What happens if we call this function like this: >>> countdown(3) The execution of countdown begins with n=3, and since n is not 0, it outputs the value 3, and then calls itself... The execution of countdown begins with n=2, and since n is not 0, it outputs the value 2, and then calls itself... The execution of countdown begins with n=1, and since n is not 0, it outputs the value 1, and then calls itself... The execution of countdown begins with n=0, and since n is 0, it outputs the word, ¡ÈBlastoff!¡É and then returns. The countdown that got n=1 returns. The countdown that got n=2 returns. The countdown that got n=3 returns. And then you¡Çre back in main (what a trip). So, the total output looks like this: 3 2 1 Blastoff! As a second example, look again at the functions newLine and threeLines: def newline(): print def threeLines(): newLine() newLine() newLine() Although these work, they would not be much help if we wanted to output 2 newlines, or 106. A better alternative would be this: 44 Conditionals and recursion def nLines(n): if n > 0: print nLines(n-1) This program is similar to countdown; as long as n is greater than 0, it outputs one newline and then calls itself to output n-1 additional newlines. Thus, the total number of newlines is 1 + (n - 1) which, if you do your algebra right, comes out to n. The process of a function calling itself is recursion, and such functions are said to be recursive. 4.10 Stack diagrams for recursive functions In Section 3.11, we used a stack diagram to represent the state of a program during a function call. The same kind of diagram can help interpret a recursive function. Every time a function gets called, Python creates a new function frame, which contains the function¡Çs local variables and parameters. For a recursive function, there might be more than one frame on the stack at the same time. This figure shows a stack diagram for countdown called with n = 3: __main__ countdown countdown countdown countdown n 3 n 2 n 1 n 0 As usual, the top of the stack is the frame for main . It is empty because we did not create any variables in main or pass any arguments to it. The four countdown frames have different values for the parameter n. The bottom of the stack, where n=0, is called the base case. It does not make a recursive call, so there are no more frames. As an exercise, draw a stack diagram for nLines called with n=4. 4.11 Infinite recursion 45 4.11 Infinite recursion If a recursion never reaches a base case, it goes on making recursive calls forever, and the program never terminates. This is known as infinite recursion, and it is generally not considered a good idea. Here is a minimal program with an infinite recursion: def recurse(): recurse() In most programming environments, a program with infinite recursion does not really run forever. Python reports an error message when the maximum recursion depth is reached: File "<stdin>", line 2, in recurse (98 repetitions omitted) File "<stdin>", line 2, in recurse RuntimeError: Maximum recursion depth exceeded This traceback is a little bigger than the one we saw in the previous chapter. When the error occurs, there are 100 recurse frames on the stack! As an exercise, write a function with infinite recursion and run it in the Python interpreter. 4.12 Keyboard input The programs we have written so far are a bit rude in the sense that they accept no input from the user. They just do the same thing every time. Python provides built-in functions that get input from the keyboard. The simplest is called raw input. When this function is called, the program stops and waits for the user to type something. When the user presses Return or the Enter key, the program resumes and raw input returns what the user typed as a string: >>> input = raw_input () What are you waiting for? >>> print input What are you waiting for? Before calling raw input, it is a good idea to print a message telling the user what to input. This message is called a prompt. We can supply a prompt as an argument to raw input: 46 Conditionals and recursion >>> name = raw_input ("What...is your name? ") What...is your name? Arthur, King of the Britons! >>> print name Arthur, King of the Britons! If we expect the response to be an integer, we can use the input function: prompt = "What...is the airspeed velocity of an unladen swallow?\n" speed = input(prompt) The sequence \n at the end of the string represents a newline, so the user¡Çs input appears below the prompt. If the user types a string of digits, it is converted to an integer and assigned to speed. Unfortunately, if the user types a character that is not a digit, the program crashes: >>> speed = input (prompt) What...is the airspeed velocity of an unladen swallow? What do you mean, an African or a European swallow? SyntaxError: invalid syntax To avoid this kind of error, it is generally a good idea to use raw input to get a string and then use conversion functions to convert to other types. 4.13 Glossary modulus operator: An operator, denoted with a percent sign (%), that works on integers and yields the remainder when one number is divided by another. boolean expression: An expression that is either true or false. comparison operator: One of the operators that compares two values: ==, !=, >, <, >=, and <=. logical operator: One of the operators that combines boolean expressions: and, or, and not. conditional statement: A statement that controls the flow of execution depending on some condition. condition: The boolean expression in a conditional statement that determines which branch is executed. compound statement: A statement that consists of a header and a body. The header ends with a colon (:). The body is indented relative to the header. 4.13 Glossary 47 block: A group of consecutive statements with the same indentation. body: The block in a compound statement that follows the header. nesting: One program structure within another, such as a conditional statement inside a branch of another conditional statement. recursion: The process of calling the function that is currently executing. base case: A branch of the conditional statement in a recursive function that does not result in a recursive call. infinite recursion: A function that calls itself recursively without ever reaching the base case. Eventually, an infinite recursion causes a runtime error. prompt: A visual cue that tells the user to input data. 48 Conditionals and recursion Chapter 5 Fruitful functions 5.1 Return values Some of the built-in functions we have used, such as the math functions, have produced results. Calling the function generates a new value, which we usually assign to a variable or use as part of an expression. e = math.exp(1.0) height = radius * math.sin(angle) But so far, none of the functions we have written has returned a value. In this chapter, we are going to write functions that return values, which we will call fruitful functions, for want of a better name. The first example is area, which returns the area of a circle with the given radius: import math def area(radius): temp = math.pi * radius**2 return temp We have seen the return statement before, but in a fruitful function the return statement includes a return value. This statement means: ¡ÈReturn immediately from this function and use the following expression as a return value.¡É The expression provided can be arbitrarily complicated, so we could have written this function more concisely: def area(radius): return math.pi * radius**2 50 Fruitful functions On the other hand, temporary variables like temp often make debugging easier. Sometimes it is useful to have multiple return statements, one in each branch of a conditional: def absoluteValue(x): if x < 0: return -x else: return x Since these return statements are in an alternative conditional, only one will be executed. As soon as one is executed, the function terminates without executing any subsequent statements. Code that appears after a return statement, or any other place the flow of execution can never reach, is called dead code. In a fruitful function, it is a good idea to ensure that every possible path through the program hits a return statement. For example: def absoluteValue(x): if x < 0: return -x elif x > 0: return x This program is not correct because if x happens to be 0, neither condition is true, and the function ends without hitting a return statement. In this case, the return value is a special value called None: >>> print absoluteValue(0) None As an exercise, write a compare function that returns 1 if x > y, 0 if x == y, and -1 if x < y. 5.2 Program development At this point, you should be able to look at complete functions and tell what they do. Also, if you have been doing the exercises, you have written some small functions. As you write larger functions, you might start to have more difficulty, especially with runtime and semantic errors. To deal with increasingly complex programs, we are going to suggest a technique called incremental development. The goal of incremental development is to 5.2 Program development 51 avoid long debugging sessions by adding and testing only a small amount of code at a time. As an example, suppose you want to find the distance between two points, given by the coordinates (x1, y1) and (x2, y2). By the Pythagorean theorem, the distance is: distance = p(x2 − x1)2 + (y2 − y1)2 The first step is to consider what a distance function should look like in Python. In other words, what are the inputs (parameters) and what is the output (return value)? In this case, the two points are the inputs, which we can represent using four parameters. The return value is the distance, which is a floating-point value. Already we can write an outline of the function: def distance(x1, y1, x2, y2): return 0.0 Obviously, this version of the function doesn¡Çt compute distances; it always returns zero. But it is syntactically correct, and it will run, which means that we can test it before we make it more complicated. To test the new function, we call it with sample values: >>> distance(1, 2, 4, 6) 0.0 We chose these values so that the horizontal distance equals 3 and the vertical distance equals 4; that way, the result is 5 (the hypotenuse of a 3-4-5 triangle). When testing a function, it is useful to know the right answer. At this point we have confirmed that the function is syntactically correct, and we can start adding lines of code. After each incremental change, we test the function again. If an error occurs at any point, we know where it must be—in the last line we added. A logical first step in the computation is to find the differences x2−x1 and y2−y1. We will store those values in temporary variables named dx and dy and print them. def distance(x1, y1, x2, y2): dx = x2 - x1 dy = y2 - y1 print "dx is", dx print "dy is", dy return 0.0 52 Fruitful functions If the function is working, the outputs should be 3 and 4. If so, we know that the function is getting the right arguments and performing the first computation correctly. If not, there are only a few lines to check. Next we compute the sum of squares of dx and dy: def distance(x1, y1, x2, y2): dx = x2 - x1 dy = y2 - y1 dsquared = dx**2 + dy**2 print "dsquared is: ", dsquared return 0.0 Notice that we removed the print statements we wrote in the previous step. Code like that is called scaffolding because it is helpful for building the program but is not part of the final product. Again, we would run the program at this stage and check the output (which should be 25). Finally, if we have imported the math module, we can use the sqrt function to compute and return the result: def distance(x1, y1, x2, y2): dx = x2 - x1 dy = y2 - y1 dsquared = dx**2 + dy**2 result = math.sqrt(dsquared) return result If that works correctly, you are done. Otherwise, you might want to print the value of result before the return statement. When you start out, you should add only a line or two of code at a time. As you gain more experience, you might find yourself writing and debugging bigger chunks. Either way, the incremental development process can save you a lot of debugging time. The key aspects of the process are: 1. Start with a working program and make small incremental changes. At any point, if there is an error, you will know exactly where it is. 2. Use temporary variables to hold intermediate values so you can output and check them. 5.3 Composition 53 3. Once the program is working, you might want to remove some of the scaffolding or consolidate multiple statements into compound expressions, but only if it does not make the program difficult to read. As an exercise, use incremental development to write a function called hypotenuse that returns the length of the hypotenuse of a right triangle given the lengths of the two legs as arguments. Record each stage of the incremental development process as you go. 5.3 Composition As you should expect by now, you can call one function from within another. This ability is called composition. As an example, we¡Çll write a function that takes two points, the center of the circle and a point on the perimeter, and computes the area of the circle. Assume that the center point is stored in the variables xc and yc, and the perimeter point is in xp and yp. The first step is to find the radius of the circle, which is the distance between the two points. Fortunately, there is a function, distance, that does that: radius = distance(xc, yc, xp, yp) The second step is to find the area of a circle with that radius and return it: result = area(radius) return result Wrapping that up in a function, we get: def area2(xc, yc, xp, yp): radius = distance(xc, yc, xp, yp) result = area(radius) return result We called this function area2 to distinguish it from the area function defined earlier. There can only be one function with a given name within a given module. The temporary variables radius and result are useful for development and debugging, but once the program is working, we can make it more concise by composing the function calls: def area2(xc, yc, xp, yp): return area(distance(xc, yc, xp, yp)) 54 Fruitful functions As an exercise, write a function slope(x1, y1, x2, y2) that returns the slope of the line through the points (x1, y1) and (x2, y2). Then use this function in a function called intercept(x1, y1, x2, y2) that returns the y-intercept of the line through the points (x1, y1) and (x2, y2). 5.4 Boolean functions Functions can return boolean values, which is often convenient for hiding complicated tests inside functions. For example: def isDivisible(x, y): if x % y == 0: return True else: return False The name of this function is isDivisible. It is common to give boolean functions names that sound like yes/no questions. isDivisible returns either True or False to indicate whether the x is or is not divisible by y. We can make the function more concise by taking advantage of the fact that the condition of the if statement is itself a boolean expression. We can return it directly, avoiding the if statement altogether: def isDivisible(x, y): return x % y == 0 This session shows the new function in action: >>> isDivisible(6, 4) False >>> isDivisible(6, 3) True Boolean functions are often used in conditional statements: if isDivisible(x, y): print "x is divisible by y" else: print "x is not divisible by y" It might be tempting to write something like: if isDivisible(x, y) == True: 5.5 More recursion 55 But the extra comparison is unnecessary. As an exercise, write a function isBetween(x, y, z) that returns True if y ≤ x ≤ z or False otherwise. 5.5 More recursion So far, you have only learned a small subset of Python, but you might be interested to know that this subset is a complete programming language, which means that anything that can be computed can be expressed in this language. Any program ever written could be rewritten using only the language features you have learned so far (actually, you would need a few commands to control devices like the keyboard, mouse, disks, etc., but that¡Çs all). Proving that claim is a nontrivial exercise first accomplished by Alan Turing, one of the first computer scientists (some would argue that he was a mathematician, but a lot of early computer scientists started as mathematicians). Accordingly, it is known as the Turing Thesis. If you take a course on the Theory of Computation, you will have a chance to see the proof. To give you an idea of what you can do with the tools you have learned so far, we¡Çll evaluate a few recursively defined mathematical functions. A recursive definition is similar to a circular definition, in the sense that the definition contains a reference to the thing being defined. A truly circular definition is not very useful: frabjuous: An adjective used to describe something that is frabjuous. If you saw that definition in the dictionary, you might be annoyed. On the other hand, if you looked up the definition of the mathematical function factorial, you might get something like this: 0! = 1 n! = n(n − 1)! This definition says that the factorial of 0 is 1, and the factorial of any other value, n, is n multiplied by the factorial of n − 1. So 3! is 3 times 2!, which is 2 times 1!, which is 1 times 0!. Putting it all together, 3! equals 3 times 2 times 1 times 1, which is 6. If you can write a recursive definition of something, you can usually write a Python program to evaluate it. The first step is to decide what the parameters are for this function. With little effort, you should conclude that factorial has a single parameter: def factorial(n): 56 Fruitful functions If the argument happens to be 0, all we have to do is return 1: def factorial(n): if n == 0: return 1 Otherwise, and this is the interesting part, we have to make a recursive call to find the factorial of n − 1 and then multiply it by n: def factorial(n): if n == 0: return 1 else: recurse = factorial(n-1) result = n * recurse return result The flow of execution for this program is similar to the flow of countdown in Section 4.9. If we call factorial with the value 3: Since 3 is not 0, we take the second branch and calculate the factorial of n-1... Since 2 is not 0, we take the second branch and calculate the factorial of n-1... Since 1 is not 0, we take the second branch and calculate the factorial of n-1... Since 0 is 0, we take the first branch and return 1 without making any more recursive calls. The return value (1) is multiplied by n, which is 1, and the result is returned. The return value (1) is multiplied by n, which is 2, and the result is returned. The return value (2) is multiplied by n, which is 3, and the result, 6, becomes the return value of the function call that started the whole process. Here is what the stack diagram looks like for this sequence of function calls: 5.6 Leap of faith 57 n 3 recurse 2 recurse 1 recurse 1 return 1 return 2 return 6 __main__ factorial n 2 n 1 n 0 factorial factorial factorial 1 1 2 6 The return values are shown being passed back up the stack. In each frame, the return value is the value of result, which is the product of n and recurse. Notice that in the last frame, the local variables recurse and result do not exist, because the branch that creates them did not execute. 5.6 Leap of faith Following the flow of execution is one way to read programs, but it can quickly become labyrinthine. An alternative is what we call the ¡Èleap of faith.¡É When you come to a function call, instead of following the flow of execution, you assume that the function works correctly and returns the appropriate value. In fact, you are already practicing this leap of faith when you use built-in functions. When you call math.cos or math.exp, you don¡Çt examine the implementations of those functions. You just assume that they work because the people who wrote the built-in functions were good programmers. The same is true when you call one of your own functions. For example, in Section 5.4, we wrote a function called isDivisible that determines whether one number is divisible by another. Once we have convinced ourselves that this function is correct—by testing and examining the code—we can use the function without looking at the code again. The same is true of recursive programs. When you get to the recursive call, instead of following the flow of execution, you should assume that the recursive call works (yields the correct result) and then ask yourself, ¡ÈAssuming that I can find the factorial of n − 1, can I compute the factorial of n?¡É In this case, it is clear that you can, by multiplying by n. 58 Fruitful functions Of course, it¡Çs a bit strange to assume that the function works correctly when you haven¡Çt finished writing it, but that¡Çs why it¡Çs called a leap of faith! 5.7 One more example In the previous example, we used temporary variables to spell out the steps and to make the code easier to debug, but we could have saved a few lines: def factorial(n): if n == 0: return 1 else: return n * factorial(n-1) From now on, we will tend to use the more concise form, but we recommend that you use the more explicit version while you are developing code. When you have it working, you can tighten it up if you are feeling inspired. After factorial, the most common example of a recursively defined mathematical function is fibonacci, which has the following definition: fibonacci(0) = 1 fibonacci(1) = 1 fibonacci(n) = fibonacci(n − 1) + fibonacci(n − 2); Translated into Python, it looks like this: def fibonacci (n): if n == 0 or n == 1: return 1 else: return fibonacci(n-1) + fibonacci(n-2) If you try to follow the flow of execution here, even for fairly small values of n, your head explodes. But according to the leap of faith, if you assume that the two recursive calls work correctly, then it is clear that you get the right result by adding them together. 5.8 Checking types What happens if we call factorial and give it 1.5 as an argument? 5.8 Checking types 59 >>> factorial (1.5) RuntimeError: Maximum recursion depth exceeded It looks like an infinite recursion. But how can that be? There is a base case— when n == 0. The problem is that the values of n miss the base case. In the first recursive call, the value of n is 0.5. In the next, it is -0.5. From there, it gets smaller and smaller, but it will never be 0. We have two choices. We can try to generalize the factorial function to work with floating-point numbers, or we can make factorial check the type of its argument. The first option is called the gamma function and it¡Çs a little beyond the scope of this book. So we¡Çll go for the second. We can use the built-in function isinstance to verify the type of the argument. While we¡Çre at it, we also make sure the argument is positive: def factorial (n): if not isinstance(n, int): print "Factorial is only defined for integers." return -1 elif n < 0: print "Factorial is only defined for positive integers." return -1 elif n == 0: return 1 else: return n * factorial(n-1) Now we have three base cases. The first catches nonintegers. The second catches negative integers. In both cases, the program prints an error message and returns a special value, -1, to indicate that something went wrong: >>> factorial ("fred") Factorial is only defined for integers. -1 >>> factorial (-2) Factorial is only defined for positive integers. -1 If we get past both checks, then we know that n is a positive integer, and we can prove that the recursion terminates. This program demonstrates a pattern sometimes called a guardian. The first two conditionals act as guardians, protecting the code that follows from values that might cause an error. The guardians make it possible to prove the correctness of the code. 60 Fruitful functions 5.9 Glossary fruitful function: A function that yields a return value. return value: The value provided as the result of a function call. temporary variable: A variable used to store an intermediate value in a complex calculation. dead code: Part of a program that can never be executed, often because it appears after a return statement. None: A special Python value returned by functions that have no return statement, or a return statement without an argument. incremental development: A program development plan intended to avoid debugging by adding and testing only a small amount of code at a time. scaffolding: Code that is used during program development but is not part of the final version. guardian: A condition that checks for and handles circumstances that might cause an error. Chapter 6 Iteration 6.1 Multiple assignment As you may have discovered, it is legal to make more than one assignment to the same variable. A new assignment makes an existing variable refer to a new value (and stop referring to the old value). bruce = 5 print bruce, bruce = 7 print bruce The output of this program is 5 7, because the first time bruce is printed, his value is 5, and the second time, his value is 7. The comma at the end of the first print statement suppresses the newline after the output, which is why both outputs appear on the same line. Here is what multiple assignment looks like in a state diagram: 7 5 bruce With multiple assignment it is especially important to distinguish between an assignment operation and a statement of equality. Because Python uses the equal sign (=) for assignment, it is tempting to interpret a statement like a = b as a statement of equality. It is not! 62 Iteration First, equality is commutative and assignment is not. For example, in mathematics, if a = 7 then 7 = a. But in Python, the statement a = 7 is legal and 7 = a is not. Furthermore, in mathematics, a statement of equality is always true. If a = b now, then a will always equal b. In Python, an assignment statement can make two variables equal, but they don¡Çt have to stay that way: a = 5 b = a # a and b are now equal a = 3 # a and b are no longer equal The third line changes the value of a but does not change the value of b, so they are no longer equal. (In some programming languages, a different symbol is used for assignment, such as <- or :=, to avoid confusion.) Although multiple assignment is frequently helpful, you should use it with caution. If the values of variables change frequently, it can make the code difficult to read and debug. 6.2 The while statement Computers are often used to automate repetitive tasks. Repeating identical or similar tasks without making errors is something that computers do well and people do poorly. We have seen two programs, nLines and countdown, that use recursion to perform repetition, which is also called iteration. Because iteration is so common, Python provides several language features to make it easier. The first feature we are going to look at is the while statement. Here is what countdown looks like with a while statement: def countdown(n): while n > 0: print n n = n-1 print "Blastoff!" Since we removed the recursive call, this function is not recursive. You can almost read the while statement as if it were English. It means, ¡ÈWhile n is greater than 0, continue displaying the value of n and then reducing the value of n by 1. When you get to 0, display the word Blastoff!¡É More formally, here is the flow of execution for a while statement: 6.2 The while statement 63 1. Evaluate the condition, yielding 0 or 1. 2. If the condition is false (0), exit the while statement and continue execution at the next statement. 3. If the condition is true (1), execute each of the statements in the body and then go back to step 1. The body consists of all of the statements below the header with the same indentation. This type of flow is called a loop because the third step loops back around to the top. Notice that if the condition is false the first time through the loop, the statements inside the loop are never executed. The body of the loop should change the value of one or more variables so that eventually the condition becomes false and the loop terminates. Otherwise the loop will repeat forever, which is called an infinite loop. An endless source of amusement for computer scientists is the observation that the directions on shampoo, ¡ÈLather, rinse, repeat,¡É are an infinite loop. In the case of countdown, we can prove that the loop terminates because we know that the value of n is finite, and we can see that the value of n gets smaller each time through the loop, so eventually we have to get to 0. In other cases, it is not so easy to tell: def sequence(n): while n != 1: print n, if n%2 == 0: # n is even n = n/2 else: # n is odd n = n*3+1 The condition for this loop is n != 1, so the loop will continue until n is 1, which will make the condition false. Each time through the loop, the program outputs the value of n and then checks whether it is even or odd. If it is even, the value of n is divided by 2. If it is odd, the value is replaced by n*3+1. For example, if the starting value (the argument passed to sequence) is 3, the resulting sequence is 3, 10, 5, 16, 8, 4, 2, 1. Since n sometimes increases and sometimes decreases, there is no obvious proof that n will ever reach 1, or that the program terminates. For some particular values of n, we can prove termination. For example, if the starting value is a power of two, then the value of n will be even each time through the loop until it reaches 1. The previous example ends with such a sequence, starting with 16. 64 Iteration Particular values aside, the interesting question is whether we can prove that this program terminates for all positive values of n. So far, no one has been able to prove it or disprove it! As an exercise, rewrite the function nLines from Section 4.9 using iteration instead of recursion. 6.3 Tables One of the things loops are good for is generating tabular data. Before computers were readily available, people had to calculate logarithms, sines and cosines, and other mathematical functions by hand. To make that easier, mathematics books contained long tables listing the values of these functions. Creating the tables was slow and boring, and they tended to be full of errors. When computers appeared on the scene, one of the initial reactions was, ¡ÈThis is great! We can use the computers to generate the tables, so there will be no errors.¡É That turned out to be true (mostly) but shortsighted. Soon thereafter, computers and calculators were so pervasive that the tables became obsolete. Well, almost. For some operations, computers use tables of values to get an approximate answer and then perform computations to improve the approximation. In some cases, there have been errors in the underlying tables, most famously in the table the Intel Pentium used to perform floating-point division. Although a log table is not as useful as it once was, it still makes a good example of iteration. The following program outputs a sequence of values in the left column and their logarithms in the right column: x = 1.0 while x < 10.0: print x, ¡Ç\t¡Ç, math.log(x) x = x + 1.0 The string ¡Ç\t¡Ç represents a tab character. As characters and strings are displayed on the screen, an invisible marker called the cursor keeps track of where the next character will go. After a print statement, the cursor normally goes to the beginning of the next line. The tab character shifts the cursor to the right until it reaches one of the tab stops. Tabs are useful for making columns of text line up, as in the output of the previous program: 6.3 Tables 65 1.0 0.0 2.0 0.69314718056 3.0 1.09861228867 4.0 1.38629436112 5.0 1.60943791243 6.0 1.79175946923 7.0 1.94591014906 8.0 2.07944154168 9.0 2.19722457734 If these values seem odd, remember that the log function uses base e. Since powers of two are so important in computer science, we often want to find logarithms with respect to base 2. To do that, we can use the following formula: log2 x = loge x loge 2 Changing the output statement to: print x, ¡Ç\t¡Ç, math.log(x)/math.log(2.0) yields: 1.0 0.0 2.0 1.0 3.0 1.58496250072 4.0 2.0 5.0 2.32192809489 6.0 2.58496250072 7.0 2.80735492206 8.0 3.0 9.0 3.16992500144 We can see that 1, 2, 4, and 8 are powers of two because their logarithms base 2 are round numbers. If we wanted to find the logarithms of other powers of two, we could modify the program like this: x = 1.0 while x < 100.0: print x, ¡Ç\t¡Ç, math.log(x)/math.log(2.0) x = x * 2.0 Now instead of adding something to x each time through the loop, which yields an arithmetic sequence, we multiply x by something, yielding a geometric sequence. The result is: 66 Iteration 1.0 0.0 2.0 1.0 4.0 2.0 8.0 3.0 16.0 4.0 32.0 5.0 64.0 6.0 Because of the tab characters between the columns, the position of the second column does not depend on the number of digits in the first column. Logarithm tables may not be useful any more, but for computer scientists, knowing the powers of two is! As an exercise, modify this program so that it outputs the powers of two up to 65,536 (that¡Çs 216). Print it out and memorize it. The backslash character in ¡Ç\t¡Ç indicates the beginning of an escape sequence. Escape sequences are used to represent invisible characters like tabs and newlines. The sequence \n represents a newline. An escape sequence can appear anywhere in a string; in the example, the tab escape sequence is the only thing in the string. How do you think you represent a backslash in a string? As an exercise, write a single string that produces this output. 6.4 Two-dimensional tables A two-dimensional table is a table where you read the value at the intersection of a row and a column. A multiplication table is a good example. Let¡Çs say you want to print a multiplication table for the values from 1 to 6. A good way to start is to write a loop that prints the multiples of 2, all on one line: i = 1 while i <= 6: print 2*i, ¡Ç ¡Ç, i = i + 1 print 6.5 Encapsulation and generalization 67 The first line initializes a variable named i, which acts as a counter or loop variable. As the loop executes, the value of i increases from 1 to 6. When i is 7, the loop terminates. Each time through the loop, it displays the value of 2*i, followed by three spaces. Again, the comma in the print statement suppresses the newline. After the loop completes, the second print statement starts a new line. The output of the program is: 2 4 6 8 10 12 So far, so good. The next step is to encapsulate and generalize. 6.5 Encapsulation and generalization Encapsulation is the process of wrapping a piece of code in a function, allowing you to take advantage of all the things functions are good for. You have seen two examples of encapsulation: printParity in Section 4.5; and isDivisible in Section 5.4. Generalization means taking something specific, such as printing the multiples of 2, and making it more general, such as printing the multiples of any integer. This function encapsulates the previous loop and generalizes it to print multiples of n: def printMultiples(n): i = 1 while i <= 6: print n*i, ¡Ç\t¡Ç, i = i + 1 print To encapsulate, all we had to do was add the first line, which declares the name of the function and the parameter list. To generalize, all we had to do was replace the value 2 with the parameter n. If we call this function with the argument 2, we get the same output as before. With the argument 3, the output is: 3 6 9 12 15 18 With the argument 4, the output is: 4 8 12 16 20 24 68 Iteration By now you can probably guess how to print a multiplication table—by calling printMultiples repeatedly with different arguments. In fact, we can use another loop: i = 1 while i <= 6: printMultiples(i) i = i + 1 Notice how similar this loop is to the one inside printMultiples. All we did was replace the print statement with a function call. The output of this program is a multiplication table: 1 2 3 4 5 6 2 4 6 8 10 12 3 6 9 12 15 18 4 8 12 16 20 24 5 10 15 20 25 30 6 12 18 24 30 36 6.6 More encapsulation To demonstrate encapsulation again, let¡Çs take the code from the end of Section 6.5 and wrap it up in a function: def printMultTable(): i = 1 while i <= 6: printMultiples(i) i = i + 1 This process is a common development plan. We develop code by writing lines of code outside any function, or typing them in to the interpreter. When we get the code working, we extract it and wrap it up in a function. This development plan is particularly useful if you don¡Çt know, when you start writing, how to divide the program into functions. This approach lets you design as you go along. 6.7 Local variables 69 6.7 Local variables You might be wondering how we can use the same variable, i, in both printMultiples and printMultTable. Doesn¡Çt it cause problems when one of the functions changes the value of the variable? The answer is no, because the i in printMultiples and the i in printMultTable are not the same variable. Variables created inside a function definition are local; you can¡Çt access a local variable from outside its ¡Èhome¡É function. That means you are free to have multiple variables with the same name as long as they are not in the same function. The stack diagram for this program shows that the two variables named i are not the same variable. They can refer to different values, and changing one does not affect the other. i 2 1 3 n 3 i 2 1 printMultTable printMultiples The value of i in printMultTable goes from 1 to 6. In the diagram it happens to be 3. The next time through the loop it will be 4. Each time through the loop, printMultTable calls printMultiples with the current value of i as an argument. That value gets assigned to the parameter n. Inside printMultiples, the value of i goes from 1 to 6. In the diagram, it happens to be 2. Changing this variable has no effect on the value of i in printMultTable. It is common and perfectly legal to have different local variables with the same name. In particular, names like i and j are used frequently as loop variables. If you avoid using them in one function just because you used them somewhere else, you will probably make the program harder to read. 70 Iteration 6.8 More generalization As another example of generalization, imagine you wanted a program that would print a multiplication table of any size, not just the six-by-six table. You could add a parameter to printMultTable: def printMultTable(high): i = 1 while i <= high: printMultiples(i) i = i + 1 We replaced the value 6 with the parameter high. If we call printMultTable with the argument 7, it displays: 1 2 3 4 5 6 2 4 6 8 10 12 3 6 9 12 15 18 4 8 12 16 20 24 5 10 15 20 25 30 6 12 18 24 30 36 7 14 21 28 35 42 This is fine, except that we probably want the table to be square—with the same number of rows and columns. To do that, we add another parameter to printMultiples to specify how many columns the table should have. Just to be annoying, we call this parameter high, demonstrating that different functions can have parameters with the same name (just like local variables). Here¡Çs the whole program: def printMultiples(n, high): i = 1 while i <= high: print n*i, ¡Ç\t¡Ç, i = i + 1 print def printMultTable(high): i = 1 while i <= high: printMultiples(i, high) i = i + 1 Notice that when we added a new parameter, we had to change the first line of the function (the function heading), and we also had to change the place where 6.9 Functions 71 the function is called in printMultTable. As expected, this program generates a square seven-by-seven table: 1 2 3 4 5 6 7 2 4 6 8 10 12 14 3 6 9 12 15 18 21 4 8 12 16 20 24 28 5 10 15 20 25 30 35 6 12 18 24 30 36 42 7 14 21 28 35 42 49 When you generalize a function appropriately, you often get a program with capabilities you didn¡Çt plan. For example, you might notice that, because ab = ba, all the entries in the table appear twice. You could save ink by printing only half the table. To do that, you only have to change one line of printMultTable. Change printMultiples(i, high) to printMultiples(i, i) and you get 1 2 4 3 6 9 4 8 12 16 5 10 15 20 25 6 12 18 24 30 36 7 14 21 28 35 42 49 As an exercise, trace the execution of this version of printMultTable and figure out how it works. 6.9 Functions A few times now, we have mentioned ¡Èall the things functions are good for.¡É By now, you might be wondering what exactly those things are. Here are some of them: • Giving a name to a sequence of statements makes your program easier to read and debug. 72 Iteration • Dividing a long program into functions allows you to separate parts of the program, debug them in isolation, and then compose them into a whole. • Functions facilitate both recursion and iteration. • Well-designed functions are often useful for many programs. Once you write and debug one, you can reuse it. 6.10 Glossary multiple assignment: Making more than one assignment to the same variable during the execution of a program. iteration: Repeated execution of a set of statements using either a recursive function call or a loop. loop: A statement or group of statements that execute repeatedly until a terminating condition is satisfied. infinite loop: A loop in which the terminating condition is never satisfied. body: The statements inside a loop. loop variable: A variable used as part of the terminating condition of a loop. tab: A special character that causes the cursor to move to the next tab stop on the current line. newline: A special character that causes the cursor to move to the beginning of the next line. cursor: An invisible marker that keeps track of where the next character will be printed. escape sequence: An escape character (\) followed by one or more printable characters used to designate a nonprintable character. encapsulate: To divide a large complex program into components (like functions) and isolate the components from each other (by using local variables, for example). generalize: To replace something unnecessarily specific (like a constant value) with something appropriately general (like a variable or parameter). Generalization makes code more versatile, more likely to be reused, and sometimes even easier to write. development plan: A process for developing a program. In this chapter, we demonstrated a style of development based on developing code to do simple, specific things and then encapsulating and generalizing. Chapter 7 Strings 7.1 A compound data type So far we have seen three types: int, float, and string. Strings are qualitatively different from the other two because they are made up of smaller pieces— characters. Types that comprise smaller pieces are called compound data types. Depending on what we are doing, we may want to treat a compound data type as a single thing, or we may want to access its parts. This ambiguity is useful. The bracket operator selects a single character from a string. >>> fruit = "banana" >>> letter = fruit[1] >>> print letter The expression fruit[1] selects character number 1 from fruit. The variable letter refers to the result. When we display letter, we get a surprise: a The first letter of "banana" is not a. Unless you are a computer scientist. In that case you should think of the expression in brackets as an offset from the beginning of the string, and the offset of the first letter is zero. So b is the 0th letter (¡Èzero-eth¡É) of "banana", a is the 1th letter (¡Èone-eth¡É), and n is the 2th (¡Ètwo-eth¡É) letter. To get the first letter of a string, you just put 0, or any expression with the value 0, in the brackets: 74 Strings >>> letter = fruit[0] >>> print letter b The expression in brackets is called an index. An index specifies a member of an ordered set, in this case the set of characters in the string. The index indicates which one you want, hence the name. It can be any integer expression. 7.2 Length The len function returns the number of characters in a string: >>> fruit = "banana" >>> len(fruit) 6 To get the last letter of a string, you might be tempted to try something like this: length = len(fruit) last = fruit[length] # ERROR! That won¡Çt work. It causes the runtime error IndexError: string index out of range. The reason is that there is no 6th letter in "banana". Since we started counting at zero, the six letters are numbered 0 to 5. To get the last character, we have to subtract 1 from length: length = len(fruit) last = fruit[length-1] Alternatively, we can use negative indices, which count backward from the end of the string. The expression fruit[-1] yields the last letter, fruit[-2] yields the second to last, and so on. 7.3 Traversal and the for loop A lot of computations involve processing a string one character at a time. Often they start at the beginning, select each character in turn, do something to it, and continue until the end. This pattern of processing is called a traversal. One way to encode a traversal is with a while statement: index = 0 while index < len(fruit): letter = fruit[index] print letter index = index + 1 7.3 Traversal and the for loop 75 This loop traverses the string and displays each letter on a line by itself. The loop condition is index < len(fruit), so when index is equal to the length of the string, the condition is false, and the body of the loop is not executed. The last character accessed is the one with the index len(fruit)-1, which is the last character in the string. As an exercise, write a function that takes a string as an argument and outputs the letters backward, one per line. Using an index to traverse a set of values is so common that Python provides an alternative, simpler syntax—the for loop: for char in fruit: print char Each time through the loop, the next character in the string is assigned to the variable char. The loop continues until no characters are left. The following example shows how to use concatenation and a for loop to generate an abecedarian series. ¡ÈAbecedarian¡É refers to a series or list in which the elements appear in alphabetical order. For example, in Robert McCloskey¡Çs book Make Way for Ducklings, the names of the ducklings are Jack, Kack, Lack, Mack, Nack, Ouack, Pack, and Quack. This loop outputs these names in order: prefixes = "JKLMNOPQ" suffix = "ack" for letter in prefixes: print letter + suffix The output of this program is: Jack Kack Lack Mack Nack Oack Pack Qack Of course, that¡Çs not quite right because ¡ÈOuack¡É and ¡ÈQuack¡É are misspelled. As an exercise, modify the program to fix this error. 76 Strings 7.4 String slices A segment of a string is called a slice. Selecting a slice is similar to selecting a character: >>> s = "Peter, Paul, and Mary" >>> print s[0:5] Peter >>> print s[7:11] Paul >>> print s[17:21] Mary The operator [n:m] returns the part of the string from the ¡Èn-eth¡É character to the ¡Èm-eth¡É character, including the first but excluding the last. This behavior is counterintuitive; it makes more sense if you imagine the indices pointing between the characters, as in the following diagram: fruit " b a n a n a " index 0 1 2 3 4 5 6 If you omit the first index (before the colon), the slice starts at the beginning of the string. If you omit the second index, the slice goes to the end of the string. Thus: >>> fruit = "banana" >>> fruit[:3] ¡Çban¡Ç >>> fruit[3:] ¡Çana¡Ç What do you think s[:] means? 7.5 String comparison The comparison operators work on strings. To see if two strings are equal: if word == "banana": print "Yes, we have no bananas!" 7.6 Strings are immutable 77 Other comparison operations are useful for putting words in alphabetical order: if word < "banana": print "Your word," + word + ", comes before banana." elif word > "banana": print "Your word," + word + ", comes after banana." else: print "Yes, we have no bananas!" You should be aware, though, that Python does not handle upper- and lowercase letters the same way that people do. All the uppercase letters come before all the lowercase letters. As a result: Your word, Zebra, comes before banana. A common way to address this problem is to convert strings to a standard format, such as all lowercase, before performing the comparison. A more difficult problem is making the program realize that zebras are not fruit. 7.6 Strings are immutable It is tempting to use the [] operator on the left side of an assignment, with the intention of changing a character in a string. For example: greeting = "Hello, world!" greeting[0] = ¡ÇJ¡Ç # ERROR! print greeting Instead of producing the output Jello, world!, this code produces the runtime error TypeError: object doesn¡Çt support item assignment. Strings are immutable, which means you can¡Çt change an existing string. The best you can do is create a new string that is a variation on the original: greeting = "Hello, world!" newGreeting = ¡ÇJ¡Ç + greeting[1:] print newGreeting The solution here is to concatenate a new first letter onto a slice of greeting. This operation has no effect on the original string. 78 Strings 7.7 A find function What does the following function do? def find(str, ch): index = 0 while index < len(str): if str[index] == ch: return index index = index + 1 return -1 In a sense, find is the opposite of the [] operator. Instead of taking an index and extracting the corresponding character, it takes a character and finds the index where that character appears. If the character is not found, the function returns -1. This is the first example we have seen of a return statement inside a loop. If str[index] == ch, the function returns immediately, breaking out of the loop prematurely. If the character doesn¡Çt appear in the string, then the program exits the loop normally and returns -1. This pattern of computation is sometimes called a ¡Èeureka¡É traversal because as soon as we find what we are looking for, we can cry ¡ÈEureka!¡É and stop looking. As an exercise, modify the find function so that it has a third parameter, the index in the string where it should start looking. 7.8 Looping and counting The following program counts the number of times the letter a appears in a string: fruit = "banana" count = 0 for char in fruit: if char == ¡Ça¡Ç: count = count + 1 print count This program demonstrates another pattern of computation called a counter. The variable count is initialized to 0 and then incremented each time an a is found. (To increment is to increase by one; it is the opposite of decrement, and unrelated to ¡Èexcrement,¡É which is a noun.) When the loop exits, count contains the result—the total number of a¡Çs. 7.9 The string module 79 As an exercise, encapsulate this code in a function named countLetters, and generalize it so that it accepts the string and the letter as arguments. As a second exercise, rewrite this function so that instead of traversing the string, it uses the three-parameter version of find from the previous. 7.9 The string module The string module contains useful functions that manipulate strings. As usual, we have to import the module before we can use it: >>> import string The string module includes a function named find that does the same thing as the function we wrote. To call it we have to specify the name of the module and the name of the function using dot notation. >>> fruit = "banana" >>> index = string.find(fruit, "a") >>> print index 1 This example demonstrates one of the benefits of modules—they help avoid collisions between the names of built-in functions and user-defined functions. By using dot notation we can specify which version of find we want. Actually, string.find is more general than our version. First, it can find substrings, not just characters: >>> string.find("banana", "na") 2 Also, it takes an additional argument that specifies the index it should start at: >>> string.find("banana", "na", 3) 4 Or it can take two additional arguments that specify a range of indices: >>> string.find("bob", "b", 1, 2) -1 In this example, the search fails because the letter b does not appear in the index range from 1 to 2 (not including 2). 80 Strings 7.10 Character classification It is often helpful to examine a character and test whether it is upper- or lowercase, or whether it is a character or a digit. The string module provides several constants that are useful for these purposes. The string string.lowercase contains all of the letters that the system considers to be lowercase. Similarly, string.uppercase contains all of the uppercase letters. Try the following and see what you get: >>> print string.lowercase >>> print string.uppercase >>> print string.digits We can use these constants and find to classify characters. For example, if find(lowercase, ch) returns a value other than -1, then ch must be lowercase: def isLower(ch): return string.find(string.lowercase, ch) != -1 Alternatively, we can take advantage of the in operator, which determines whether a character appears in a string: def isLower(ch): return ch in string.lowercase As yet another alternative, we can use the comparison operator: def isLower(ch): return ¡Ça¡Ç <= ch <= ¡Çz¡Ç If ch is between a and z, it must be a lowercase letter. As an exercise, discuss which version of isLower you think will be fastest. Can you think of other reasons besides speed to prefer one or the other? Another constant defined in the string module may surprise you when you print it: >>> print string.whitespace Whitespace characters move the cursor without printing anything. They create the white space between visible characters (at least on white paper). The constant string.whitespace contains all the whitespace characters, including space, tab (\t), and newline (\n). 7.11 Glossary 81 There are other useful functions in the string module, but this book isn¡Çt intended to be a reference manual. On the other hand, the Python Library Reference is. Along with a wealth of other documentation, it¡Çs available from the Python website, www.python.org. 7.11 Glossary compound data type: A data type in which the values are made up of components, or elements, that are themselves values. traverse: To iterate through the elements of a set, performing a similar operation on each. index: A variable or value used to select a member of an ordered set, such as a character from a string. slice: A part of a string specified by a range of indices. mutable: A compound data types whose elements can be assigned new values. counter: A variable used to count something, usually initialized to zero and then incremented. increment: To increase the value of a variable by one. decrement: To decrease the value of a variable by one. whitespace: Any of the characters that move the cursor without printing visible characters. The constant string.whitespace contains all the whitespace characters. 82 Strings Chapter 8 Lists A list is an ordered set of values, where each value is identified by an index. The values that make up a list are called its elements. Lists are similar to strings, which are ordered sets of characters, except that the elements of a list can have any type. Lists and strings—and other things that behave like ordered sets—are called sequences. 8.1 List |