Introduction to Python
Software version | 2.7 / 3.2 |
Website | Python Website |
Introduction
Python is an interpreted, multi-paradigm programming language. It supports structured imperative programming and object-oriented programming. It features strong dynamic typing, automatic memory management through garbage collection, and an exception handling system; thus it is similar to Perl, Ruby, Scheme, Smalltalk, and Tcl.
The Python language is under a free license similar to the BSD license and runs on most computing platforms, from supercomputers to mainframes, from Windows to Unix including Linux and MacOS, with Java or even .NET. It is designed to optimize programmer productivity by providing high-level tools and a simple syntax. It is also appreciated by educators who find in it a language where syntax, clearly separated from low-level mechanisms, allows for an easier introduction to basic programming concepts.
Among all the programming languages currently available, Python is one of the easiest to learn. Python was created in the late ’80s and has matured enormously since then. It comes pre-installed in most Linux distributions and is often one of the most overlooked when choosing a language to learn. We’ll confront command-line programming and play with GUI (Graphical User Interface) programming. Let’s dive in by creating a simple application.
In this documentation, we’ll see how to write Python code and we’ll use the interpreter (via the python command). The lines below corresponding to the interpreter will be visible via elements of this type: ‘»>’ or ‘…’
When you write a file that should understand Python, it should contain this at the beginning (the Shebang and encoding):
|
|
The encoding line is optional but necessary if you use accents.
Syntax
In Python, we must indent our lines to make them readable and especially for them to work. You need to indent using tabs or spaces. Be careful not to mix the two, Python doesn’t like that at all: your code won’t work and you won’t get an explicit error message.
You can use comments at the end of lines if you wish. Here’s an example:
|
|
The end of a block is automatically done with a line break.
help
Know that at any time you have the possibility to ask for help thanks to the help command. For example for help with the input method:
|
|
You can also access this help from the shell using the pydoc command:
|
|
Displaying text
Let’s see the first most basic command, displaying text:
|
|
Then here’s how to concatenate 2 elements:
|
|
When using the comma, strings are automatically separated by a space character, whereas when using concatenation, you have to manually manage this issue. If you concatenate non-string variables, you’ll need to convert them before you can display them:
|
|
If you don’t want to have an automatic line return (the equivalent of “\n”), you simply put a comma at the end of your line:
|
|
In Python 3.2, here’s how to print:
|
|
sep
In Python 3.2, with sep we can separate strings with characters:
|
|
end
In Python 3.2, with end we can add a character at the end of a string:
|
|
Data types
There are several data types in Python:
- Integers: allow representing integers on 32 bits (from -2,147,483,648 to 2,147,483,647).
|
|
- Long integers: any integer that cannot be represented on 32 bits. An integer can be forced to a long integer by following it with the letter L or l.
|
|
- Floats: The symbol used to determine the decimal part is the dot. You can also use the letter E or e to indicate an exponent in base 10.
|
|
Calculations
When doing divisions, you need to be careful, results depend on the context! See for yourself:
|
|
Be careful with operations! Here’s an example:
|
|
You see, there’s a rounding issue!
It’s possible to perform an operation using the old value of a variable with +=, -=, *=, and /= (variable += value is equivalent to variable = variable + value). For example:
|
|
And you can also perform multiple variable assignments with different values by separating variable names and values with commas:
|
|
To raise a number to any power, use the ** operator:
|
|
String manipulation
Concatenation (assembling several strings to produce only one) is done using the + operator. The * operator allows repeating a string. Examples:
|
|
Access to a particular character in the string is done by indicating its position in brackets (the first character is at position 0):
|
|
Booleans
Boolean values are noted as True or False. In a test context, values 0, empty string, and None are considered as False. Here are the comparison operators:
- ==: for equality
- !=: for difference
- <: less than
: greater than
- <=: less than or equal
=: greater than or equal
|
|
To combine tests, we use boolean operators:
- and
- or
- not
|
|
Structures
if
Here’s how an if statement is constructed:
|
|
for
Here’s a for loop with a list:
|
|
We’ll use a list here with two new parameters: continue and break:
|
|
- continue: allows moving to the next element
- break: stops the loop it’s in
While
Here’s an example of a while loop:
|
|
- continue: allows moving to the next element
- break: stops the loop it’s in
Lists
String characters
As with a list, we can access its elements (the characters) by specifying an index:
|
|
However, it’s impossible to modify a string! We say it’s an immutable list:
|
|
In fact, here we’re making a new assignment: the variable chaine is overwritten (erased then replaced) with the value chaine + ‘!’. This is not a modification in the proper sense.
In Python 2.7, we’ll use formatting expressions from C:
- %d: for an integer
- %f: for a float
- %s: for a string
- %.3f: to force display of 3 digits after the decimal point
- %02d: to force display of an integer on two characters
|
|
In Python 3.2:
|
|
Tuples
Tuples are also immutable lists (string characters are therefore actually special tuples):
|
|
As a reminder, a tuple containing only one element is noted in parentheses, but with a comma before the last parenthesis:
|
|
Using lists
Modifying lists
To modify lists:
|
|
Adding an element to the end of a list
The append() method adds an element to the end of a list:
|
|
Adding an element at a defined location in a list
The insert() method allows specifying the index where to insert an element:
|
|
Removing an element at a defined location in a list
To delete an element, you can use the remove()/del() method which removes the first occurrence of the value passed as a parameter:
|
|
Concatenating 2 lists
The extend() method allows concatenating two lists without reassignment. This operation is achievable with reassignment using the + operator:
|
|
Checking if an element belongs to a list
If the tested element is in the list, the returned value will be True and otherwise it will be False:
|
|
Getting the size of a list
If you want to know the number of elements present in a list:
|
|
List comprehension
Python implements a mechanism called “list comprehension”, allowing use of a function that will be applied to each element of a list:
|
|
Thanks to the instruction for x in l, we retrieve each element contained in the list l and place them in the variable x. We then calculate all the square values (x**2) in the list, which produces a new list. This mechanism can produce more complex results. It can also be used with dictionaries.
Slicing
Slicing is a method applicable to all list-type objects (except dictionaries). It’s a “slicing into pieces” of list elements to retrieve certain objects. This is translated in this form:
d[start:end:step]
Here’s an example:
|
|
- No step indication was given, the default value is then used, that is, 1.
- If the start value is absent, the default value used will be 0
- If the end value is omitted, the default value used will be the length of the string + 1
Reading a string
To retrieve the entire string:
|
|
To invert the direction:
|
|
Accessing the last element of the list
To access the last element of the list:
|
|
If you give the interval [-2:-1], you will not have the last letter, and if you give [-2:0] you will get nothing since it’s impossible to go from -2 to 0 with a step of 1 (the last letter being -1).
On Tuples
Applied to lists and tuples, slicing reacts in the same way. The difference is that we no longer manipulate only characters but any type of data:
|
|
Deleting elements from a list
Here’s slicing on a list to remove elements:
|
|
Copying lists
Look at this if you want to make copies of a list:
|
|
You’ll notice that both lists are affected because in fact, this acts as an alias! To make a copy, there are two solutions! Here’s the first, you need to use [:]:
|
|
Dictionaries
Deleting keys
Dictionaries are key => values lists. Like for lists, the del command allows removing an element and the word key in allows checking the existence of a key:
|
|
If the lists are not in the desired order, it’s simply because dictionaries are not ordered!
Getting keys and values
In Python 2.7, to read the keys and values of a list:
|
|
In Python 3.2, it’s a little different:
|
|
Dictionaries don’t consume as much memory as storing a list, only a pointer to the current element is stored.
Adding an error if an element is not found
There’s a solution to return a substitute value if no value is found during a search in a list thanks to the get() method:
|
|
Concatenating dictionaries
You can, just like lists, concatenate dictionaries with the update() method:
|
|
Copying a dictionary
To copy an already existing dictionary, use the copy() method:
|
|
Complex Dictionaries and Lists
pickle
There is a solution to simply save and restore complex structures (multidimensional lists/dictionaries for example) through serialization, called Pickle. Using pickle remains very dangerous: only use files for which data can be verified, because loading a corrupted file can lead to execution of malicious code!
|
|
dump
To write data, we’ll open a file in binary mode (wb). We’ll then use the dump() function to write a variable by specifying the file object as a parameter:
|
|
To read data from a file (rb), we’ll use the load() function by passing the file object as a parameter:
|
|
dumps
If you want to see the serialization of a variable, you can use the dumps() function, which performs the same task as dump(), but into a string rather than a file:
|
|
ConfigParser
This function allows managing ini-type files. It’s very convenient for managing configuration files. For those who don’t see what it looks like:
|
|
We can access the value of an option in a particular section:
|
|
To get everything from a section:
|
|
Stdin
STDIN allows displaying a message on the screen so a user can type characters on the keyboard and press the enter key to validate. In Python 2.7 there are 2 solutions:
- input(): retrieves an integer or a float
- raw_input(): retrieves a string
input
With the input() function, entering a string will result in an attempt to interpret it as a variable and thus cause an error:
|
|
In Python 3.2, there is only one input() function.
The main conversion functions are:
- int() to convert to integer
- float() to convert to float
- complex() to convert to complex
- str() to convert to string (useless when using with input()).
raw_input
raw_input() is simpler since any input will be considered a string:
|
|
Handles
Moving to a directory
It’s possible to move to a directory via the chdir() function:
|
|
Reading a file
It’s possible to open files for reading(r), writing(w), and appending(a):
|
|
Here for example, we’ll open a file named fichier.txt and we need to enter the mode (r/w/a).
Read
We can read the entire file to work on its content afterward (be careful about the file size which will be stored in memory):
|
|
You’ll note that line breaks are not interpreted! To interpret them, we’ll need to use print:
|
|
It’s possible to read x characters from the reading location (the beginning by default or from another location if you’ve already started reading the file):
|
|
We read the first 8 characters, then the next 8.
Readline
readlines() is identical to the read() function, except that the data will be sent in a list where each element will contain a line (with a \n at the end of each line):
|
|
Writing to a file
To write to a file, it’s very simple, we’ll call the write function:
|
|
Strip
You may know chomp in Perl, the strip() function allows doing the same thing, that is removing invisible characters (spaces, tabs, line breaks) from the beginning and end of a string. This function is part of the string module and has two variants:
- lstrip(): only removes characters at the beginning of the string (’l’ for left)
- rstrip(): only removes characters at the end of the string (‘r’ for right)
|
|
Functions
Here’s how we define a function (def) and we call it with its name and parentheses at the end:
|
|
Function documentation
Documentation related to a function is written using triple quotes:
|
|
You saw then, we can ask for help on a function directly from Python.
Function parameters
In Python, all parameters passed make calls to their memory address! However, since some types are not modifiable, it will then seem like they are passed by value. The non-modifiable types are simple types (integers, floats, complexes, etc.), strings, and tuples:
|
|
With a modifiable parameter, such as a list, the modifications will be visible when exiting the function.
Here we ask it to take the function argument (x) and raise it to the power. The function will return the result thanks to the return. If a function doesn’t have a return specified, the return value will be None.
|
|
If parameters are not specified when calling the function, the default values will be used. The only requirement is that parameters having a default value must be specified at the end of the parameter list. Here’s an example with its calls:
|
|
You’ve understood, we can therefore declare default values and override them on demand by passing them as arguments. The order of a function’s parameters is not fixed if we name them:
|
|
The number of arguments of a function
In Python, you don’t have to define beforehand the number of arguments that will be used. For this, we use *args and **kwargs:
- *args: tuple containing the list of parameters passed by the user
- **kwargs: dictionary of parameters
The difference between the two syntaxes (* or **) is the type of the return variable:
|
|
Modules
Modules are very useful, as they correspond to ready-made functions, saving a lot of time when we use them. Imports can be placed anywhere in the code and can be integrated into conditional loops. But it’s much easier to find them at the top of the file.
To load a module with all its functions:
|
|
Here we ask Python to load all functions () of the module_name1 module into memory. If we don’t include (), we’ll need to prefix the module name before calling its function:
|
|
If you wish to import only a few functions from a module (takes less memory space and allows using only what we want):
|
|
Note: In case of identical function names, the last import takes precedence over those before!
If you’re handling modules with a very long name, you can define an alias with the keyword ‘as’. For example, a module named ModuleWithAVeryLongName containing the func() function. With each call to func() you’re not going to write: ModuleWithAVeryLongName.func()… So you need to use as:
|
|
If during a module import, it contains instructions for immediate execution (which are not in functions), these will be executed at the time of import. To differentiate the behavior of the interpreter during direct execution of a module or during its loading, there is a specific test to add which allows determining a sort of main program in the module:
|
|
To define the body of the main program, insert this line:
|
|
Module Paths
By default, Python starts by searching:
- In the current directory
- In the directory(ies) specified by the PYTHONPATH environment variable (if defined)
- In the Python library directory: /usr/lib/python
So be very careful when naming your files: if they have the name of an existing Python module, they will be imported instead of the latter since the import first looks for modules in the current directory.
Here’s an example of the PYTHONPATH (environment variables):
|
|
To indicate that A is a “module directory”, we’ll create an additional file in the folder containing the module: ‘_ init _.py’. This file may contain nothing or contain functions. Modules are also called packages.
String
string is installed by default with Python. It provides many methods to search for text and replace strings. It helps avoid using regex which can in some cases be very CPU intensive if poorly written.
split
The split() function allows cutting a string passed as a parameter following one or more separator characters and returns a list of the cut strings. If nothing is specified, the space character will be used:
|
|
Join
Join is the opposite of split and allows joining several elements together via one or more separator characters:
|
|
Lower
Lower allows converting a string to lowercase:
|
|
Attention: We talk about conversion but these methods do not modify the original string but return a new string!!!
Upper
Upper allows converting a string to uppercase:
|
|
Attention: We talk about conversion but these methods do not modify the original string but return a new string!!!
Capitalize
This function allows capitalizing only the first letter of a string:
|
|
Capwords
Capwords allows capitalizing the beginning of each word:
|
|
Count
The count() function allows counting the number of occurrences of a substring in a string. The first parameter is the string in which to perform the search and the second parameter is the substring:
|
|
Find
The find() function allows finding the index of the first occurrence of a substring. The parameters are the same as for the count() function:
|
|
In case of failure, find() returns the value -1 (0 corresponds to the index of the first character).
Replace
The replace() function allows, as its name suggests, replacing a substring with another inside a string of characters. The parameters are, in order:
- The string to modify
- The substring to replace
- The replacement substring
- The maximum number of occurrences to replace (if not specified, all occurrences will be replaced), this is optional
|
|
Maketrans
maketrans() creates the translation table and thanks to the translate() function, applies the translations to a string of characters:
|
|
When creating the table with maketrans(), each character in the first position is transformed with characters in the second position. So e’s are replaced with 3’s, a with 4, and i with !.
Substitute and Safe_substitute
There’s a simpler approach than maketrans(). We can write this in a more readable and more flexible way:
|
|
- $$ allows displaying the character $
- ${var_name} allows isolating a variable included in a word
You should use the safe_substitute() function if you don’t want Python to cause an error in case of non-substitution (because absent from the dictionary):
|
|
Regex
A regex allows for example finding elements within a line/phrase that could correspond to certain elements but for which we don’t always have certainty.
Search
We’ll use the re module which offers the search() function:
|
|
If the pattern is not found, the search() function returns the value None.
If you frequently use the same search pattern, it’s better to compile it to have more efficient code:
|
|
When you use complex patterns, it’s recommended to comment them and therefore write a “verbose” regular expression. In this mode, multiple spaces and comments need to be ignored:
|
|
Attention: the search() function only allows finding the first substring matching the searched pattern!
List of regex
Here are the most common regex:
Operator | Description | Example |
---|---|---|
^ | Beginning of line | ^Deimos for ‘Deimos Fr!!!’ |
$ | End of line | !$ for ‘Deimos Fr!!!’ |
. | Any character | d.im.s for ‘Deimos Fr!!!’ |
* | Repetition of the previous character from 0 to x times | !* for ‘Deimos Fr !!!’ |
+ | Repetition of the previous character from 1 to x times | !+ for ‘Deimos Fr !!!’ |
? | Repetition of the previous character from 0 to 1 time | F? for ‘Deimos Fr!!!’ |
\ | Escape character | . for ‘Deimos Fr.’ |
a,b,…z | Specific character | Deimos for ‘Deimos Fr’ |
\w | Alphanumeric character (a…z,A…Z,0…9) | \weimos for ‘Deimos Fr’ |
\W | Anything except an alphanumeric character | I**\Wll for ‘I’**ll be back’ |
\d | A digit | \d for 1 |
\D | Anything except a digit | \Deimos for Deimos |
\s | A spacing character such as: space, tab, carriage return or line break (\f,\t,\r,\n) | ‘Deimos**\s**Fr’ for ‘Deimos Fr’ |
\S | Anything except a spacing character | ‘Deimos**\S**Fr’ for ‘Deimos Fr’ |
{x} | Repeats the previous character exactly x times | !{3} in ‘Deimos Fr !!!’ |
{x,} | Repeats the previous character at least x times | !{2} in ‘Deimos Fr !!!’ |
{, x} | Repeats between 0 and x times the previous character | !{3} in ‘Deimos Fr !!!’ |
{x, y} | Repeats between x and y times the previous character | !{1, 3} in ‘Deimos Fr !!!’ |
[] | Allows setting a range (from a to z[a-z], from 0 to 9[0-9]…) | [A-D][0-5] in ‘A4’ |
[^] | Allows specifying unwanted characters | [^0-9]eimos in ‘Deimos’ |
() | Allows recording the content of parentheses for later use | (Deimos) in ‘Deimos Fr’ |
| | Allows doing an exclusive or | (Org|Fr|Com) in ‘Deimos Fr’ |
There’s a site allowing visualizing a regex: Regexper (sources: https://github.com/javallone/regexper)
Searching all patterns
If you want to find all occurrences of a string, use findall(). It works the same way as search() but returns a list of strings corresponding to the pattern searched:
|
|
If you want to get an element providing you with the same information as search(), it’s the finditer() function you’ll need to use:
|
|
Memorizing and using found patterns
Regular expressions allow referencing an element noted in parentheses. We can then call them by referring (from left to right) to parenthesis number n (preceded by a backslash). To summarize, for the first parenthesis encountered, you use \1, for the second \2, etc…:
|
|
We preceded the pattern r’…’. This allows determining a raw string that will interpret backslashes differently (they don’t protect any character and must therefore be considered as characters in their own right).
For better readability, it’s possible to name the captured elements:
|
|
And use them:
|
|
Example:
|
|
Groups
We can also retrieve the captured value after a call to a search()-type function. Elements are retrieved in a list thanks to the groups() function and can be obtained using the group() function which takes as parameter the number of the element (starts at 1, and for index 0 the function returns the whole string).
|
|
We can also get them in a dictionary using the groupdict() function (if you’ve named the captured elements, this is the method you’ll need to choose). Let’s first see an example of application with the groups() and group() functions:
|
|
Search parameters
It’s possible in a search to indicate certain parameters, we’ll see them here:
Option | Description |
---|---|
IGNORECASE | Performs a case-insensitive search |
MULTILINE | The search can contain several lines separated by a line break (character \n) |
DOTALL | The search can contain several lines separated by a dot (character .) |
UNICODE | Allows using Unicode encoding for the search (useful with accented characters). Your strings will need to be given in Unicode format by prefixing them with u (Useless in Python 3.2): u’Unicode string' |
You can also specify several options with a pipe:
|
|
Parameters in patterns
The previous parameters can be indicated directly in the patterns (?option):
Pattern option | Correspondence |
---|---|
i | IGNORECASE |
m | MULTILINE |
s | DOTALL |
u | UNICODE |
x | VERBOSE |
Here’s an example:
|
|
Substitution
Substitution is done using the sub() method and returns the modified string:
|
|
For named captured elements, we’ll use \g
|
|
sub() performs substitutions for all occurrences of the pattern found!
To limit substitutions to the first n strings matching the pattern, use the subn() function:
|
|
Split
Finally, let’s note the existence of a split() function where the separation characters are determined by a pattern:
|
|
argv
In Python, the simplest way to work taking into account the input of arguments on command lines is to use the sys module.
|
|
You’ll notice that you need to do -1 to know the number of arguments!
When launching the script:
|
|
argparse
The argparse module is the new version of the optparse module which is now deprecated. We’ll create an ArgumentParser object that will contain the list of arguments:
|
|
- description: allows indicating by a small comment what your script does
- epilog: the text that will be displayed at the end of the automatically generated help. These texts will be used when displaying the automatically generated help.
Adding arguments to argparse
Let’s define a -a option that collects no information and will only indicate if the user used it when calling the command (True) or not (False):
|
|
Don’t forget that this module automatically handles help :-). However, we can declare the command version:
|
|
We can use here -v or –version, knowing that it’s the first argument containing – that will be the one to use when retrieving arguments.
- If you want to make an argument mandatory:
|
|
- You can specify the number of necessary arguments:
|
|
Here the number of expected arguments is 2 and will be returned in a list (args.a). You can set the possible presence of this variable with a ? to nargs. Just like regex, it’s also possible to put * to nargs to specify 0 or more arguments. And finally the + for 1 to more arguments.
Argument actions
There are several possible actions for arguments:
- store: This is the default action that saves the value the user will have entered
- store_const: Allows defining a default value if the user enters the argument. Otherwise the args.v variable will contain the value ‘None’:
|
|
- store_true/store_false: If the argument is specified, the value will be true, otherwise false. In case of absence of argument, the args.v variable takes the boolean value opposite to that indicated by the store action.
- append: Records the value in a list. If several calls to the same argument are specified, then they are added to this same list:
|
|
- append_const: It’s a mix between store_const and append. It allows saving predefined elements in a list:
|
|
- version: Allows indicating the software version.
|
|
%(prog)s refers to the name of the running script (like sys.argv[0]).
Choosing your argument prefix characters
If we want to use characters other than - or –, it’s possible for example to add +:
|
|
We can for example have a use case where we add options (+option) or remove them (-option).
Forcing values
It’s possible to force values entered by the user by specifying a type (int, float, bool, file…). This allows converting data on the fly and checking that the values entered by the user are of the expected type:
|
|
We can also define a default value in all circumstances:
|
|
Note: A call to the command without specifying an -a argument will still initialize the args.i variable with the value 10.
Managing duplicates
If we declare the same argument twice, it’s possible (but not clean) to have Python resolve this automatically. The last value will then overwrite the first one(s):
|
|
Option groups
- For better readability, it’s possible to group certain options when displaying help:
|
|
- It’s also possible to define groups where only one option is selectable under penalty of being rejected if more than one is requested:
|
|
Using the command
The script we just created has the -a option and -h for help which is automatically generated:
|
|
Using arguments
To display the list of arguments:
|
|
To display the value of the -a argument:
|
|
Retrieving arguments
To retrieve the arguments that the user will have entered:
|
|
References
http://inforef.be/swi/python.htm http://diamond.izibookstore.com/produit/50/9786000048501/Linux%20Pratique%20HS%20n23%20%20La%20programmation%20avec%20Python http://docs.python.org/library/string.html
Last updated 06 Jun 2012, 12:45 CEST.