Introduction to Perl
Introduction
This documentation is intended for people with good knowledge of software development, as it simply provides notes on Perl syntax and how it works.
Perl is a programming language created by Larry Wall in 1987, incorporating features from the C language and scripting languages like sed, awk, and shell.
Larry Wall gives two interpretations of the “PERL” acronym:
- Practical Extraction and Report Language
- Pathetically Eclectic Rubbish Lister
These names are retroactive acronyms.
The organization responsible for the development and promotion of Perl is The Perl Foundation. In France, “Les Mongueurs de Perl” promotes this language, notably through Perl Days events.
Basics
Exponents:
|
|
Simplified integer literal notation:
|
|
Here are the escape character sequences:
Construct | Meaning |
---|---|
\n | Newline |
\r | Return |
\t | Tab |
\f | FormFeed |
\b | Backspace |
\a | Bell |
\e | Escape (ASCII escape character) |
\007 | Any octal ASCII value (here, 007 = bell) |
\x7F | Any hex ASCII value (here, 7F = delete) |
\cC | A “control” character (here, Ctrl-C) |
\ | Backslash |
" | Double quote |
\l | Lowercase next letter |
\L | Lowercase all following letters until \E |
\u | Uppercase next letter |
\U | Uppercase all following letters until \E |
\Q | Quote non-word characters by adding a backslash until \E |
\E | Terminate \L, \U, or \Q |
String operators (concatenation):
|
|
Be careful with string operators for numbers:
|
|
Binary assignment operator:
|
|
Avoid:
|
|
Operator precedence and associativity:
Associativity | Operators |
---|---|
left | parentheses and arguments to list operators |
left | -> |
++ – (autoincrement and autodecrement) | |
right | ** |
right | \ ! ~ + - (unary operators) |
left | =~ !~ |
left | * / % x |
left | + - . (binary operators) |
left | « » |
named unary operators (~X filetests, rand) | |
< <= > >= lt le gt ge (the “unequal” ones) | |
== != <=> eq ne cmp (the “equal” ones) | |
left | & |
left | | ^ |
left | && |
left | || |
.. … | |
right | ? : (ternary) |
right | = += -= *= etc. (and similar assignment operators) |
left | , => |
list operators (rightward) | |
right | not |
left | and |
left | or xor |
Comparison operators:
Comparison | Numeric | String |
---|---|---|
Equal | == | eq |
Not equal | != | ne |
Less than | < | lt |
Greater than | > | gt |
Less than or equal to | <= | le |
Greater than or equal to | >= | ge |
Loops
If
Here is the structure of an if statement:
|
|
To get STDIN as a scalar variable (Reminder: STDIN is used to capture what the user types on the keyboard):
|
|
while
Here is the structure of a while loop:
|
|
Useful Functions
Defined
Checks if a value is defined or not
|
|
Chomp
Removes newline characters:
|
|
Arrays
Here’s how to set the 0th entry of an array:
|
|
And here’s how you can calculate an array index:
|
|
To create 99 undefined elements:
|
|
To define the last element of the array, there are two methods:
|
|
To know the number of columns in an array:
|
|
Now $numbers contains the number of columns.
To empty an array:
|
|
Lists
Here’s an example of lists:
|
|
qw
If for instance we want to have a list like this:
|
|
To simplify things, we can do this:
|
|
You can replace / with !, #, (), {}, [] or <>.
The advantage of using these notations is that malformed spaces or new lines will be automatically ignored, and your words will be taken into account. Here’s another possible form of writing:
|
|
We can also do:
|
|
List Assignments
To assign lists:
|
|
Here toto equals un, tata equals deux…
We can swap this way:
|
|
Be careful with ignored elements like:
|
|
Here, only the first 2 will be considered. For the reverse:
|
|
$tata will have an undef value.
pop
Pop is used to remove elements from the end of an array:
|
|
When the array is empty, pop returns undef.
push
Push is used to add values to the end of arrays:
|
|
shift
Shift is like pop but acts at the beginning of the array:
|
|
unshift
Unshift is like push, but acts at the beginning of the array:
|
|
foreach
A foreach will allow you to iterate through a complete list:
|
|
$_
This is probably the most used default variable in Perl. It allows you to not declare variables in a loop. For example:
|
|
Another example:
|
|
reverse
Reverse takes a list of values and returns the list in reverse order:
|
|
sort
Like the sort binary on Unix, it sorts alphabetically (but in ASCII order):
|
|
List and Scalar Context
It’s important to understand this section:
|
|
|
|
To force a scalar context:
|
|
Functions
Define with sub:
|
|
And call the function with an ampersand (&):
|
|
Here’s an example:
|
|
Function Arguments
Let’s pass 2 arguments to a function:
|
|
And in the function, we’ll call the 2 arguments with $[1] or $[2] (but only in the function!) which are part of the @_ list. These variables have nothing to do with the $_ variable. Here’s an example:
|
|
If a called value is not defined, it will be set to undef.
To ensure receiving the exact number of arguments:
|
|
my
The my value allows you to create private variables. Declared in a function, it will be called in the function and at the end, it will be removed from memory.
|
|
We can simplify things:
|
|
local
Local is the old name for my, except that local saves a copy of the variable’s value in a secret place (stack). This value cannot be consulted, modified, or deleted while it’s saved. Then local initializes the variable to an empty value (undef for scalars, or empty list for arrays), or the assigned value. When returning from the function, the variable is automatically restored to its original value. In short, the variable was borrowed for a time and returned before anyone noticed.
You cannot replace local with my in old Perl scripts, but remember to use my preferably when creating new scripts. Example:
|
|
return
Return will return the value of a function. For example, return, placed in a foreach will return a value when it has been found.
use strict
To write “clean” code, it’s better to put this in your scripts:
|
|
Hashes
A hash table (or hash) is like an array except that instead of having numbers as references, we will have keys.
In case it’s not clear enough, here’s the difference:
- Array:
Identifier or key (not modifiable) | 0 | 1 | 2 |
---|---|---|---|
Value (modifiable) | tata | titi | toto |
- Hash table:
Identifier or key (modifiable) | IP | Machine | Name |
---|---|---|---|
Value (modifiable) | 192.168.0.1 | toto-portable | Toto |
We declare a hash table this way:
|
|
To copy a column:
|
|
Perl decides the layout of keys in the hash table. You won’t have the possibility to organize them as you wish! This allows Perl to access information you’re looking for more quickly.
List
To declare a hash list, we can do:
|
|
There’s another way, much clearer, to make list declarations:
|
|
We can leave the comma on the last line without any problems. This can be convenient in some cases :-)
reverse
The reverse on hash lists will prove very useful! Indeed, as you know, to perform searches, we can only take keys to find values. If we want to search in the reverse direction (values become keys, and keys become values):
|
|
But don’t think that this doesn’t take resources, because contrary to what one might think, a copy of a list does a complete unwinding, then a one-by-one copy of all elements. It’s exactly the same for a simple copy of a list:
|
|
Unfortunately, once again, this makes Perl work extremely hard! So if you have the possibility to avoid this kind of thing, that’s good :-)
Hash Functions
Keys and values
Keys and values are 2 hash functions that return only the keys or only the values:
|
|
Here @keys will contain IP, Machine and Name, while @values will contain the rest.
In scalar form, this will give us the number of elements in keys and values! Note: If a value is empty, it will be considered false.
each
To iterate through an entire hash, we’ll use each which returns a key/value pair in the form of a 2-element list:
|
|
exists
To know if a key exists in a hash:
|
|
delete
The delete function will delete the key in question:
|
|
Note: This is not equivalent to inserting undef into a hash element.
References
Here’s how to return hash references:
|
|
Input/Output
The Diamond Operator
This operator allows you to merge multiple inputs into one large file:
|
|
This operator is generally used to process all input. Using it multiple times in a program would be an error.
@ARGV
@ARGV is an array containing the calling arguments. The diamond operator will first read the @ARGV array; if the list is empty, it will read the standard input, otherwise the list of files it finds.
Standard Output
Normally, you understand that there are differences between displaying and interpolating:
|
|
printf
Printf lets you better control the output. To display a number in a generally correct way, we will use %g, which automatically selects decimal, integer, or exponential notation:
|
|
%d means a decimal integer:
|
|
Printf is often used to present data in columns. We’ll define spaces:
|
|
Here we have 4 spaces then the number 32 which gives 6 characters.
Same with %s which is dedicated to strings:
|
|
If we make the 10 negative, we’ll have left alignment:
|
|
With numbers %f, it is able to truncate:
|
|
If we want the % sign to be mentioned then we need to have %%:
|
|
In the case where we want to enter the value to truncate in STDIN, don’t forget to interpolate it:
|
|
For more information: http://perldoc.perl.org/functions/sprintf.html
Regex
Regex or regular expressions are like another language to learn, which very easily provide access to very sophisticated tools.
If for instance we want to do a search and display the matching expression (equivalent to grep):
|
|
Metacharacters
Metacharacters are used to push searches even further. If I take the example above:
|
|
The character “.” means anything. So here, it matches tata, titi, and toto. If you really want to use “.” and not have it pass as regex, you need to use the metacharacter “". Which gives “.”:
|
|
Quantifiers
The * sign means that the previous character is repeated x times or not at all:
|
|
If we want anything between toto and titi, just add a “.”:
|
|
To repeat the previous element 1 or more times, use “+”:
|
|
The ? indicates that it’s not mandatory:
|
|
Groupings
We can group with ():
|
|
Pipe
The pipe is used to designate one element or another:
|
|
If one of them matches, it is taken. If we want to search for spaces or tabs:
|
|
Character Classes
- To designate ranges or some letters or numbers, we’ll use []:
|
|
To avoid certain characters, add ^:
|
|
Now, even better! Some classes appear so often that they have been further simplified:
Operator | Description | Example |
---|---|---|
^ | Beginning of line | ^Deimos for ‘Deimos Fr!!!’ |
$ | End of line | !$ for ‘Deimos Fr!!!’ |
. | Any character | d.im.s for ‘Deimos Fr!!!’ |
* | Repetition of previous character from 0 to x times | !* for ‘Deimos Fr !!!’ |
+ | Repetition of previous character from 1 to x times | !+ for ‘Deimos Fr !!!’ |
? | Repetition of previous character from 0 to 1 time | F? for ‘Deimos Fr!!!’ |
\ | Escape character | . for ‘Deimos Fr.’ |
a,b,…z | Specific character | Deimos for ‘Deimos Fr’ |
\w | Alphanumeric character (a…z,A…Z,0…9) | \weimos for ‘Deimos Fr’ |
\W | Anything except an alphanumeric character | I**\Wll for ‘I’**ll be back’ |
\d | A digit | \d for 1 |
\D | Anything except a digit | \Deimos for Deimos |
\s | A spacing character such as: space, tab, carriage return, or line feed (\f,\t,\r,\n) | ‘Deimos**\s**Fr’ for ‘Deimos Fr’ |
\S | Anything except a spacing character | ‘Deimos**\S**Fr’ for ‘Deimos Fr’ |
{x} | Repeats the previous character exactly x times | !{3} in ‘Deimos Fr !!!’ |
{x,} | Repeats the previous character at least x times | !{2} in ‘Deimos Fr !!!’ |
{, x} | Repeats between 0 and x times the previous character | !{3} in ‘Deimos Fr !!!’ |
{x, y} | Repeats between x and y times the previous character | !{1, 3} in ‘Deimos Fr !!!’ |
[] | Allows to put a range (from a to z[a-z], from 0 to 9[0-9]…) | [A-D][0-5] in ‘A4’ |
[^] | Allows to specify unwanted characters | [^0-9]eimos in ‘Deimos’ |
() | Allows to record the content of parentheses for later use | (Deimos) in ‘Deimos Fr’ |
| | Allows to make an exclusive or | (Org|Fr|Com) in ‘Deimos Fr’ |
There’s a site that allows you to visualize a regex: Regexper (sources: https://github.com/javallone/regexper)
So:
|
|
Isn’t that beautiful? To represent a space, we can also use \s:
|
|
- Now, if we want the opposites:
|
|
or we can use uppercase:
|
|
Pretty neat, right! :-)
We can also find [\d\D] which means any digit or non-digit (unlike the . which is identical except that the . doesn’t accept new lines).
General Quantifiers
If we want to match a pattern multiple times:
|
|
Which can give for example, if we’re looking for an 8-character word:
|
|
Anchors
As there are too few characters, some are reused:
Searches | Anchors |
---|---|
Beginning of a line | /**^**My\sstart/ |
End of a line | /my\send**$**/ |
Word Anchors
To define a whole word, we’ll use \b:
|
|
To reverse the order of things, so if we want anything except toto:
|
|
But we might want titine and titi:
|
|
Super-timorous, even stronger:
|
|
Memorization Parentheses
A good example and we understand better. If we use:
|
|
To memorize this regex, we’ll use parentheses:
|
|
To reference it, we’ll use:
|
|
This is not simple to understand but it will look for a character identical to the previous search
For example if we have HTML code:
|
|
We can make our search with this:
|
|
Given the complexity of the thing, I’ll try to show good examples. First of all, you need to count the opening parentheses, this will be our regex memory number (e.g.: ((… = 2, because 2 opening parentheses):
|
|
Non-memorization Parentheses
If you want to use parentheses without them being stored in memory, you need to use these symbols “?:” like this:
|
|
Here, toto or tata won’t be stored.
Using Regex
Just as we’ve seen with the qw// operator, it’s possible to do the same thing for matches with m//:
|
|
In short, the possibilities are ^,!(<{[ ]}>)!,^. The m// shortcut is not mandatory with the double //
Ignoring Case
To ignore case, there’s /i:
|
|
Ignoring Spaces and Comments
If you want to ignore all spaces and comments in your code, add /x:
|
|
Matching Any Character
The fact that the . doesn’t match a new line character can be annoying. That’s why /s is useful. For example:
|
|
We can even make combinations:
|
|
Searching up to a Specific Character
I struggled a lot with this regex before finding it. If for example, I have a line like:
|
|
And I want to search for the content of name:
|
|
This regex won’t be enough because it will give me:
|
|
To fix the problem, here’s the solution:
|
|
I simply put a ? which will ask to search not to the last “, but to the first one!
The =~ Binding Operator
Matching with $_ is the default:
|
|
If no binding operator is indicated, it will work with $_.
Here’s another example of matching, but with regular expression memories:
|
|
One last one:
|
|
The memory remains intact until there is a match, whereas a successful match resets all of them. If you start playing too much with memories, you may have surprises. It is therefore advised to store them in variables:
|
|
Now, watch out, we’ll see the kind of things that I find great with Perl:
|
|
$&: is the match $`: what is before the match $’: what is after the match
In conclusion, if we want the original string:
|
|
These “magic” variables have a price! They slow down subsequent regular expressions. This can make you lose a few milliseconds…minutes, depending on your program. Apparently, this problem is fixed in Perl 6.
If you have the possibility to use numbering instead of these regex, don’t hesitate!
Know that you can get even more info on regex here: http://www.perl.com/doc/manual/html/pod/perlre.html
Substitution
For substitution (replacement), we’ll use s///:
|
|
Or, more simply:
|
|
Now, some slightly more complex examples:
|
|
- For a global substitution, that is, on all occurrences found, simply add “g”:
|
|
It’s possible to use other delimiters (like for m and qw) such as “{}, [], <>, ##”:
|
|
It’s also possible to combine “g” with another (like case sensitivity for example):
|
|
- To replace with uppercase, use \U:
|
|
- To replace with \L for all lowercase:
|
|
- You can disable case modification with \E:
|
|
- Written in lowercase (\l and \u), these escapes only affect the following character:
|
|
You can also combine them so that everything is lowercase except the first letter:
|
|
You can even do this in a simple print:
|
|
split
Split allows cutting based on a space, tab, period… pretty much anything except commas:
|
|
Split moves the pattern in a string and returns the list of fields separated by separators. Each match of the pattern corresponds to the end of one field and the beginning of another:
|
|
As I mentioned above, it’s also possible to make separations with spaces:
|
|
By default, if no separation options are specified, “\s+” will be used.
join
Join works exactly like split except that its result will give the inverse of split. It will join pieces (with or without) separators:
|
|
or even:
|
|
More Complex Control Structures
unless
Unless is the opposite of if, that is, we’ll enter the loop if the searched expression is not the right one. It’s actually equivalent to an else in an if loop. This also equates to making an if negative:
|
|
It is also, just like an if, possible to use else with unless, but I don’t recommend it as it’s often a source of errors.
until
If you want to invert the condition of the while loop:
|
|
This is actually a disguised while loop that repeats as long as the condition is false.
Expression Modifiers
For a more compact notation, an expression can be followed by a modifier that controls it:
|
|
Bare Block
This is a bare block:
|
|
It’s a block that will be executed only once. The advantage is that we can create variables, but they will only be kept in this block.
elsif
In an if loop, if we want to have multiple solutions, we can use elsif:
|
|
We can put as many elsif as we want (see perlfaq to emulate case or switch).
Auto Increment/Decrement
As in C, to increment a variable for example:
|
|
Same with “–” for decrementing
for
For is quite classic and resembles C again:
|
|
Another example. Imagine that we want to count from -150 to 1000 but in steps of 3:
|
|
Otherwise, a simple for loop for a successful search:
|
|
Be careful with infinite loops if you use variables:
|
|
If you really want to write an infinite loop, the best way is this:
|
|
foreach
The loop and for and foreach are identical except that if there’s no “;”, it’s a foreach loop:
|
|
So it’s a foreach loop but written with a for.
Loop Controls
“last” allows to terminate a loop immediately (like break in C or shell):
|
|
As soon as a line contains the “END” marker, the loop ends.
next
Sometimes, you’re not prepared for the loop to end, but you’ve finished the current iteration. This is where “next” comes in! It jumps to the inside of the bottom of the loop, then it goes to the next iteration of the loop:
|
|
redo
Redo indicates to go back to the beginning of the current loop block without going further in the loop:
|
|
A small test to understand well:
|
|
Labeled Blocks
Labeled blocks are used to work with loop blocks. They are made of letters, underscores, numbers but cannot start with the latter. It is advised to name them with capital letters. In reality, labels are rare. But here’s an example:
|
|
Logical Operators
Like in shell:
- && (AND): executes what follows if the previous condition is true. Also allows to say that the expression before and the one that will follow must be validated to perform what follows.
- || (OR): executes what follows if the previous condition is false. Also allows to say that if the expression before doesn’t match, the following must match to be able to continue.
|
|
We can also write like this:
|
|
They are also called short-circuit operators, because in the example below, the left operator needs to check the right one to avoid a division by 0:
|
|
Unlike other languages, the value of a short-circuit operator is the last part evaluated, not a simple boolean value.
Ternary Operator
|
|
The ternary operator looks like an if-then-else. We first check if the expression is true or false:
- If it’s true, the 2nd expression is used, otherwise the third. Each time, one of the 2 right expressions is evaluated and the other ignored.
- If the first expression is true the 2nd is evaluated and the 3rd ignored.
- If the 1st is false, the 2nd is ignored and the 3rd evaluated as the value of the whole.
|
|
Here’s another example:
|
|
A slightly more elegant example:
|
|
One last example:
|
|
File Handles and File Tests
File handles are named like other Perl identifiers (with letters, digits, and underscores, without starting with a digit) but since they don’t have a prefix, they can be confused with current or future reserved words. It is therefore advised to use only capital letters for a file handle name.
Today there are 6 file handle names used by Perl for its own use:
- STDIN
- STDOUT
- STDERR
- DATA
- ARGV
- ARGVOUT
You may see these handles written in lowercase in some scripts, but that doesn’t always work, which is why it’s advised to put everything in uppercase.
A program’s output is called STDOUT, and we’ll see how to redirect this output. You can take a look at the Perlport documentation.
Running Programs
Here are 2 ways to read your program:
|
|
This will read the input toto and send the output to titi.
|
|
This will take the input toto, send it to my soft and grep the output of my soft.
For error redirection:
|
|
Opening a Handle
Here’s how to open files:
|
|
Here are 2 methods of reading a file. For security reasons, I strongly advise you to use the last method.
|
|
This is for writing to a file.
|
|
This is also for writing to a file, but added to the end of an already existing file. Here’s an example of use:
|
|
It’s clearer to make a scalar variable and say what we’re going to do.
Closing a Handle
To close a file handle, simply do:
|
|
By default Perl will close all your files when your script closes.
Handle Problems
You may encounter problems when opening a file, such as permission issues or others. Here’s an example to avoid mistakes:
|
|
For a fatal error, we have another solution that is more common (die):
|
|
The sign “$!” is used to display the readable form of the complaint from the system.
Here are generally the possible return states:
- 0: everything went well
- 1: is a syntax error in the arguments
- 2: is an error produced during processing
- 3: the file was not found
“$!” can display the line numbers of errors. If you don’t want them, write like this:
|
|
This line will analyze the number of arguments. If it is greater than 2 this program will terminate.
We can also do like this, which is the most used solution:
|
|
You can also replace “or” with “||”, but then you’ll need to use open with parentheses.
Note: It’s possible to use “||” instead of “or”, however, it’s older code that will sometimes use the higher precedence operator. The only difference is that when “open” is written without parentheses, the higher precedence operator will be linked to the filename argument, not to the return value. So the return value of open is not checked afterwards. If you use ||, make sure to indicate the parentheses. The ideal remains to use “or” to avoid encountering unwanted effects.
Reading a Text Block
You may need to read a text block to interpret it afterwards. The idea is to make the input, then press CTRL+D afterwards to say that you’ve finished the input. Here’s how to proceed:
|
|
Warnings with warn
You can use warn just like die except that instead of quitting the application radically, it will give you a warning message.
Using File Handles
After opening a file for reading, you can read the lines as if it was the standard input STDIN:
|
|
We can also add to the end of a file:
|
|
Here the content of the print is added to the end of rc.local. Another example for an error:
|
|
HEREDOC
HEREDOC serves to include a multi-line text literally in the program, the syntax is as follows:
|
|
There are subtleties, for example for a string between apostrophes (quote ‘), the variables are not interpolated while when we put it between quotation marks (double quote “) the variables are interpolated! Well for a HEREDOC, the behavior of variables is determined by what you put around the marker:
|
|
Replacing the Default Output File Handle
By default, printf prints on the default output. The default output can be modified with the select operator.
|
|
From the moment you’ve selected a file handle as the default output, it remains so (for more info man perlfunc).
Also by default, a file handle’s output is buffered. By initializing the special variable “$|” to 1, you set the selected file (the one selected when the variable is modified) so that the buffer is always flushed after each output operation. Thus, to make sure that the log file will immediately receive all its entries (in case you read the log to monitor the progress of a long program) you can for example write:
|
|
Reopening a Standard File Handle
If you’ve already opened a handle (opening a LOG when a LOG is already open), the old one would be automatically closed. Just as you can’t use the 6 standards (except in exceptional cases). Messages such as warn or die will go directly to STDERR.
|
|
File Tests
Maybe you’d like to know how to check if a file exists, its age or other:
|
|
You can see that we didn’t put $! here, because it’s not the system that rejected a request, but my file that already exists. For regular updating, if you want to check a file is not older than 4 days:
|
|
For find users, this will please you, imagine that we don’t want files larger than 100k, we move them to a folder if it has been unused for at least 90 days:
|
|
Here are the possibilities:
The tests -r, -w, -x and -o will only work for the user running the Perl script. Also be careful with certain system limitations, such as -w which doesn’t prevent writing, only if it’s on a CD because it’s mounted read-only.
Another thing to be careful about is symbolic links which can be deceiving. That’s why, it would be better to test for the presence of a symbolic link before testing what interests you.
For searches at the time level, it’s possible that there are floating commas as well as negative numbers (if the execution is still ongoing for example).
The -t test returns true if the handle is a TTY. It’s able to be interactive with the user with for example a “-t STDIN”.
For -r, if you forget to specify a file, $_ will be taken into account.
If you want to transform a size into KB be sure to put the parentheses:
|
|
The stat and lstat Functions
Stat allows to obtain a lot of information about a file:
|
|
- $dev: Device number
- $ino: Inode number
- $mode: Gives the file permissions like ’ls -l would give
- $nlink: Number of hard links. Always worth 2 or more for folders and 1 for files.
- $uid: User ID
- $gid: Group ID
- $size: the file size in bytes (like -s)
- $atime: Equivalent to -A
- $mtime: Equivalent to -M
- $ctime: Equivalent to -C
Invoking stat on a symbolic link returns info on the original file. To get information about the symbolic link, you can use lstat. However, if there’s no info, it will give you the info of the original file instead.
Localtime
This function allows to convert unix time to human readable time:
|
|
or:
|
|
For GMT time, you can use the gmtime function:
|
|
If you want a concrete example for instance to not be able to run a script during production hours:
|
|
Bit by Bit Operators
This is useful for doing binary calculations, such as the values returned by the stat function:
|
|
Using the Special “_” File Handle
With each use of stat and lstat we make 2 system calls:
- 1: to know if it’s possible to read
- 2: to know if it’s possible to write
But we lose time when we make requests to the system. The goal is therefore to ask only once with “_” and then reuse this to get the new information. It looks a bit ridiculous like this but if you make a lot of system calls, you’ll see that with this your program will be much faster:
|
|
We use $_ for the first test which is not more efficient, but we get info from the operating system. Then we use the magic file handle: “_”. The data left after retrieving the file size is used. Here we optimize the requests.
Directory Operations
Moving in the Directory Tree
To move (equivalent to “cd” in shell):
|
|
As this is a system request, the value of $! is initialized in case of error.
Globalization
Here’s an example of globalization (in shell):
|
|
In perl it’s quite similar actually, imagine we have an array with all sorts of files:
|
|
and here’s how to bring out what interests us:
|
|
If we want multiple searches, just separate them with spaces:
|
|
Here’s another type of syntax:
|
|
equivalent to:
|
|
another example that needs no comment:
|
|
Directory Handles
If we want to list the content of a folder:
|
|
The result will be unsorted files or folders (even files starting with a .). If now, we want to get only files ending with pm, we’ll need to do like this:
|
|
If we had wanted everything except . and ..:
|
|
Now, if you want to do recursive, then I advise you the File::find library.
Manipulating Files and Directories
Deleting Files
The equivalent of rm is unlink:
|
|
Another example, with glob:
|
|
To validate the deletion of files:
|
|
We’ll see here if 0 or 3 files have been deleted, but not 1 or 2. We’ll need to make a loop in case you absolutely want to know:
|
|
Renaming Files
Here are 2 examples:
|
|
We can also proceed with a loop and elegantly rename:
|
|
Links and Files
To find out which is the source file of a symbolic link:
|
|
Creating and Deleting Directories
To create a folder it’s very simple:
|
|
Be careful however if you want to assign permissions to variables because this won’t work like this:
|
|
And there’s the catastrophe because our folder has weird permissions like 01363. All because the string is by default in decimal and we need octal, to solve this thorny problem:
|
|
Now if we want to delete a folder, it’s simple:
|
|
This will delete all empty directories in the toto folder. Rmdir returns the number of elements deleted. Rmdir will only delete a directory if it is empty, so use unlink before rmdir:
|
|
If this alternative seems too boring, use the File::Path module.
Determining the Process
To determine the currently running process, use the $$ variable. When you create a file for example, that gives:
|
|
Changing Permissions
To change permissions, simply, like in shell use chmod:
|
|
If you want to use u+x or that sort of thing, refer to http://search.cpan.org/~pinyan/File-chmod-0.32/chmod.pm
Changing the Owner
Once again it’s like in shell, we use chown:
|
|
If you don’t want to use the uid and guid to make the change and prefer names, then do like this:
|
|
The defined function checks that the return value is not undef.
Changing Date and Time
Sometimes you want to lie to certain programs, here’s how to change access and modification time:
|
|
This can be very useful in case of problems with backups.
The File::Basename Module
If we want to get the path of a binary for example, we’ll need this module, here’s how it works:
|
|
Using Only Certain Functions from a Module
Imagine that you have a function with the same name as a function of one of your modules. To load only what you need, here’s how to do it:
|
|
The File::Spec Module
With the File::Basename module, it’s convenient, you know what you need to get a file, but if you want to get the complete path where your file is, you’ll need to use the File::Spec module.
Process Management
Calling external programs can be very practical when you don’t have time to rack your brain or simply when you have no choice.
System
This one is my favorite, because it allows launching a child process of your Perl program. If you need to fork your Perl program with a command, the system function is very convenient. Personally I had to develop in Perl for a generic script (GDS) on SunPlex (Sun Cluster) for the company I work for, and I needed to fork at one point. I was delighted to see that system did it.
It is however important to understand that the system function will return the data to STDOUT and not to your Perl program:
|
|
Note that simple apostrophes are for shell values and double apostrophes for your perl program.
The problem with this command is also in the command you call because if it asks you questions (like asking for confirmation etc.), your Perl program will wait for the end of your command. To bypass this, add a &:
|
|
If you are on Windows, here’s the solution to adopt:
|
|
Where the system function has been well written is that it doesn’t require launching a shell when it’s a small command. But if it contains characters with $, / or \ for example, then a shell will be launched for this execution.
Avoiding the Shell
If you can avoid the shell call, it’s not bad. Here’s an example:
|
|
While there’s a “cleaner” way that doesn’t call the shell:
|
|
If now, I want to use the return values to see if everything went well:
|
|
or
|
|
It’s useless to use $! here because Perl cannot evaluate what happened since it doesn’t happen in Perl.
Exec
The functioning of the system function compared to exec is identical except for a very important point! Indeed, it won’t create child processes, but will execute itself.
When in your code we arrive at the exec part, it jumps into the command and the command takes the PID of your Perl program until the end of its execution and then gives back control (not to Perl, but generally to the shell, where you launched the Perl script).
Environment Variables
In Perl, there’s a hash table called %ENV containing environment variables that contains values inherited from the previous shell that was launched. There’s this:
|
|
Which will let you modify your shell PATH when you’re going to call child processes. It’s quite convenient, but it obviously doesn’t work for parent processes.
As a reminder under linux, the command to see environment variables is “env” and under Windows it’s “set”.
Using ` to Capture Output
Here’s an example of what we can do by recovering the output:
|
|
Here’s another example:
|
|
Don’t have an abusive use of ` because Perl has to work a bit harder to recover the output. If there’s no need for them, useless to use them, prefer system to that.
If you want to recover errors, use this 2>&1:
|
|
If you need to use a command that might (without you wanting it) ask you a question, that will cause problems. To avoid this kind of inconvenience, send /dev/null to this command:
|
|
Using ` in a List Context
If a program’s output returns multiple lines, then the variable will contain all the lines one after another on a single one. For example with the who command, it’s preferable to use an array:
|
|
Then for the analysis:
|
|
Notice that when =~ is not present, $_ is automatically taken.
Processes as File Handles
I really like this way of doing things, because it seems the clearest to me. You have to operate as if it was a file and not forget a | at the end of your command:
|
|
You can also put the | on the other side:
|
|
There’s not even a need to go that far anyway to do this, a simple print will do:
|
|
This use (with open) is more complex to use than with , however, it allows to have a result arriving gradually. What I'm telling you makes sense with the find command for example, which gives little by little its results. This allows you to analyze and process in real time while with
you would have to wait until the end of the find command to process:
|
|
Using fork
In addition to the high-level interfaces like above, we can in Perl make low-level system calls. For example fork, which is very practical. For example this:
|
|
in forked version gives:
|
|
The 0 means that the PID shouldn’t be 0.
If you want more information, consult the perlipc manual.
Sending and Receiving Signals
The different signals are identified by a name (for example SIGINT, for “interrupt signal”) and an integer (ranging from 1 to 16, 1 to 32, or 1 to 63, depending on your flavor of Unix). For example, if we want to send a SIGHINT:
|
|
The PID number here is 3094 and you can change 2 to INT if you want. Be careful about permissions, because if you don’t have the authorization to kill the process, you’ll get an error.
If the process no longer exists, you’ll have a return to false, which allows you to know if the process is still in use or not.
If you simply want to check if a process is alive or not, you can test it with a kill 0:
|
|
Signal interception may seem more interesting than sending. For example if someone does a Ctrl+C on your program and you still have temporary files that exist, maybe you would like these files to be deleted anyway. Here’s how to proceed:
|
|
The temporary files are created, the program runs etc… And at the end of the program we flush:
|
|
If a Ctrl+C occurs, the program jumps directly to the &flush section.
Strings and Sorts
When we need to search for text, regex are very convenient, but can sometimes be too complicated. That’s why there are strings.
Locating a Substring with index
For example, here we’re looking for “mon”:
|
|
The return value here is 9, because there are 9 elements before finding “mon”. If no occurrence has been found, the return value will be -1.
|
|
We can also reverse the search with rindex:
|
|
And finally there’s a last optional parameter that allows to give the maximum authorized return value:
|
|
Manipulating Substrings with substr
substr takes 3 arguments:
- A string value
- An initial location based on 0
- The length of the substring
The return value contains the substring:
|
|
If we don’t put a 3rd parameter, it will go to the end of the string, regardless of the length.
To invert the selection, we’ll use negative numbers:
|
|
index and substr work very well together. For example, here we’ll extract a substring starting with the letter l:
|
|
Here’s now how to make the whole a little more flexible:
|
|
Here’s an even shorter way:
|
|
In reality, we never use this kind of code. But you might need it.
Use most often the index and substr functions to regex because they don’t have the regex engine overload:
- They are never case insensitive
- They don’t care about metacharacters
- They don’t initialize any of the variables in memory
Formatting Data with sprintf
|
|
Here, $date_tag receives something like “2008/12/07 03:00:30”. The format string (the first argument of sprintf) places a 0 at the beginning of certain numbers, which we hadn’t mentioned when we studied the printf formats. This 0 asks to add leading 0s as requested to give the number the required width. Without this 0, there would be spaces instead: “2008/12/ 7 3: 0:30”.
Using sprintf with Monetary Numbers
To indicate a sum of money in the form 2.50, not 2.5 - and especially not 2.49997! The format “%2f” allows to easily get this result:
|
|
The complete implications of rounding are numerous and subtle, but most often it is desirable to keep the numbers in memory with all possible precision, only rounding for display.
|
|
|
|
Why didn’t we simply use the /g modifier to perform a “global” search and replace and avoid the pain and confusion of the 1 while? Because we’re working backwards from the decimal point, not advancing from the beginning of the string. The placement of commas in a number like this cannot be accomplished by a s///g substitution alone.
Advanced Sorting
You can use cmp to create a more complex sort order, for example case insensitive:
|
|
|
|
If for example we want to sort by file modification/creation order:
|
|
Multiple Key Sorting
|
|
We want to rank the players above by score, and if the score is identical to another, alphabetically:
|
|
If the spaceship operator sees 2 different scores, that’s the desired comparison. It returns -1 or 1 (true value) and the short-circuit or indicates that the rest of the expression should be skipped and the desired comparison is returned. (Remember that the or shortcut returns the last evaluated expression). But if the spaceship operator sees 2 identical scores, it returns 0 (false value); the cmp operator takes over and returns an appropriate ranking value considering the keys as strings. If the scores are identical, the string comparison ends the competition.
There’s no reason for your sorting subroutine to be limited to 2 levels of sorting. Below, the Bedrock library program ranks a list of customer ID numbers according to a 5-level sort order: each customer’s unpaid penalties (calculated by an absent subroutine here, &penalties), the number of items currently consulted (from %items), their name (in the order of last name then first name, both from hashes), finally the customer ID number, in case the rest is identical:
|
|
Simple Databases
Opening and Closing DBM Hashes
It’s quite simple to understand here:
|
|
Using a DBM Hash
The DBM hash looks like any other hash, but instead of being stored in memory, it’s stored on disk. Therefore, when your program opens it again, the hash already contains data from the previous call (some beginner docs will tell you to no longer use DBM bases and replace them with “tie”, however there’s no need to use complex methods when we wish to do something simple).
|
|
Manipulating Data with pack and unpack
Pack is used to pack data. We gather 3 numbers of different sizes into a 7-byte string using the formats c, s and l (reminiscent of char, short and long). The first number is packed into one byte, the second into 2 and the 3rd into 4:
|
|
It’s possible to improve visibility by placing spacings in a format string. For example, you can convert “ccccccc” to “c7”. Obviously in case of unpack, it’s simpler to use “c*” in case of unpack.
Random Access Databases of Fixed Length
There are several available formats in the pack documentation. So you have to choose the appropriate format. The open function has another mode that we haven’t presented yet. Placing “+<” before the file name parameter string is equivalent to using “<” to open the existing file for reading, while additionally requesting permission to write to the file:
|
|
Similarly, conversely, you can write to a file and then read it:
|
|
To summarize:
- “+<”: Allows to read an existing file, then write to it
- “>+”: Allows to write to a file, then read it
The latter is usually used for draft files.
Once the file is opened, we must browse it, we’ll use the seek function.
|
|
- 1st parameter: file handle
- 2nd parameter: the displacement in bytes from the beginning of the file
- 3rd parameter: The starting parameter, or 0
|
|
- 1st parameter: file handle
- 2nd parameter: buffer variable to receive the read data
- 3rd parameter: number of bytes to read
We asked for 55 bytes because that corresponds to the size of our record. But if in the buffer, you are 5 bytes from the end of the 55, then you’ll only have 5.
Here’s a small example of all this with a time function to give the time:
|
|
On some systems, it’s necessary to use seek at each transition from a read to a write, even when the current position in the file is correct. It’s therefore advised to use seek immediately before any read or write.
Here’s a small alternative to what we just saw:
|
|
Variable Length (text) Databases
The most common way to update a text file by program is to write an entirely new file similar to the old one, making the necessary changes along the way. This technique gives pretty much the same result as updating the file itself, but offers additional advantages.
That’s where Perl is once again great, it’s possible to do substitution directly on the file without having to recreate a new file with the operator “<>”:
|
|
To provide the list of files to the diamond operator, we read them in a glob. The next line initializes $^I. By default this is initialized to undef, but when it’s initialized to a certain string, it makes the diamond operator even more magical. From then on, the while loop reads a line from the old file, updates it then writes it to the new file. This program is able to update hundreds of files in a few seconds on a classic machine. Once the program is finished, Perl has automatically created a backup file of the original as if by magic.
In-place Editing from the Command Line
Imagine that you need to correct hundreds of files with a spelling mistake. Here’s the solution in a single line:
|
|
Some Advanced Perl Techniques
Intercepting Errors with eval
|
|
Now, even if $tutu equals 0, this line won’t crash the program because eval is an expression and not a control structure like while or foreach. If an error remains, you’ll find it in $@.
|
|
It’s possible to nest eval blocks within other eval blocks.
Selecting List Items with grep
Let’s choose the odd numbers from a long list of numbers. We don’t need any new features:
|
|
This code uses the modulo operator. If a number is even, this number modulo 2 gives 0 which is worth false. But an odd number will give 1; as that equals true, only odd numbers will be placed in the array.
There’s nothing wrong with this code, except that it’s a bit long to write and slow to execute, if one knows that perl offers the grep operator:
|
|
Another example:
|
|
Transforming List Elements with map
|
|
The map operator resembles grep because it has the same type of arguments: a block using $_ and a list of elements to process. It operates the same way, by evaluating the block once for each element of the list, $_ becoming the alias of a different original element of the list each time.
Hash Keys Without Quotes
Naturally not on any key; indeed, a hash key can be any string. But keys are often simple. If a hash key consists only of letters, digits and underscores, without starting with a digit, you can omit the quotes (called simple word):
|
|
More Powerful Regular Expressions
Non-greedy Quantifiers
Regular expressions can consume more or less CPU depending on how you perform the searches. The more elements found (in chronological order), the longer your program’s execution time will be.
Here’s an example to remove tags, but there’s an error:
|
|
The asterisk (*) being greedy, we won’t fall on what interests us, here’s the good solution:
|
|
The ? allows to say that it may be the case or not, and not necessarily always (as the first example would do).
Multiline Text Recognition
The ^ and $ anchors allow to match beginnings and ends of lines, but if we want to also match internal newline characters, we’ll need to use “/m”:
|
|
That makes anchors of beginning and end at each line, not of the global string.
Slices
If for example, we want to define a stat on variables with a certain number as undef:
|
|
If we indicate a wrong number of undef, we’ll unfortunately reach an atime or ctime, which is quite difficult to debug. However there is a better method! Perl knows how to index a list as if it was an array.
Here, as mtime is element 9 of the list returned by stat, we get it by an index:
|
|
Now, let’s see how with the list, we can split info:
|
|
This method is good but not efficient enough. Here’s how to do better:
|
|
If now we want to retrieve some elements from a list (-1 representing the last element):
|
|
Here’s an example of an extraction of 5 elements on a list of 10:
|
|
Array Slice
The example above even simplified would have given:
|
|
To enter information into an array easily when it’s currently variables, here’s the solution:
|
|
Hash Slice
Here’s another technique that works:
|
|
But again, this is not the most optimized solution. Here it is:
|
|
Why isn’t there a % when we’re talking about hashing? It’s the mark indicating a global hash; a hash slice (like any slice) is always a list, not a hash (just like a house fire is a fire and not a house). In Perl, the $ sign means a single element, the @ a list of elements and the % an entire hash.
Creating a Daemon
A daemon allows running an app in the background. Here’s how to do it in perl:
|
|
Creating Sockets
This part explains how to make a socket (client/server) work.
We’ll use a fork for each connection to not be limited. Otherwise, we would have to wait for the first connection to close before a second one is handled. Thanks to fork, a new child is created for each connection. It will handle it while its parent will wait for a new connection, ready to delegate it to another of its children.
The following code will be properly commented to give all necessary explanations.
Here’s what the server will look like:
|
|
The client, will look like this (much simpler):
|
|
It’s possible to code functions in both parts. The easiest thing to implement to make the server perform actions is to have it analyze what the client sends it. According to its correspondence with one regex or another, it will launch a certain function.
Going Further in Perl
First of all, know that if you need additional documentation, you can look at the following mans:
- perltoc (table of contents)
- perlfaq
- perldoc
- perlrun
For regular expressions, you can also find what you need here:
- perlre
- perltut (for tutorials)
- perlrequick
Modules
Modules are tools made to save you time. They allow you not to reinvent the wheel each time you want new functionalities. Abuse them! I invite you to go to the CPAN site and take a little tour.
To install a module, it’s very simple, as root launch the cpan command, then:
|
|
And everything will be done automatically :-)
If you encounter compilation problems, check the errors, but in many cases, you’re missing C development libraries:
|
|
In the very rare case where there’s no module to do what you want, you can develop one yourself in Perl or C (and submit it afterwards, think of the community!). Consult:
- perlmod
- perlmodlib
Listing Installed Modules
It can be very practical to list installed modules. For this we’ll need this package:
|
|
Then, all that’s left is to launch this command:
|
|
Knowing if a Module is Integrated by Default in Perl
To know if a module is integrated into the Perl Core, this command exists:
|
|
Example:
|
|
Some Important Modules
Cwd
Equivalent to the pwd command, it allows to know the directory where we are (.):
|
|
And if we want to know the perl file that is currently running (the one we launch):
|
|
Fatal
If you’re tired of writing “or die” for each invocation of open or chdir for example, fatal is made for you:
|
|
Here no need to indicate “or die”, we’ll get thrown out if we couldn’t change folders.
Sys::Hostname
To know the name of the machine:
|
|
Time::Local
Conversion from Epoch time to human readable time:
|
|
diagnostics
To have more information about errors in your code:
|
|
Big Numbers
To have a calculation with big numbers, use Math::BigFloat or Math::BigInt.
splice
It allows to add or delete elements in the middle of an array.
Security
Perl has a functionality that will know exactly what Perl uses in memory (in case there would be data corruption at this level). The anti-pollution control module is also called taint checking. Also see perlsec.
Debugging
See the B::Lint module.
Converting from Other Languages
- To convert from sed: man s2p
- To convert from awk: a2p
find
To convert the find command to perl use the find2perl command:
|
|
Socket
If we want to know if a port is listening or not:
|
|
For more info: http://perldoc.perl.org/perlipc.html#Sockets%3a-Client%2fServer-Communication
Getopt::Long and Getopt::Std
This small module is one of my favorites because it allows to manage the entire @ARGV part without racking your brain. A small example:
|
|
Here, you define variables to indicate the arguments. For example:
- s: is of type string (here help is defined by the letter h or help)
- i: is of type integer.
The advantage is that this module handles the variables no matter where they’re located, no matter if there’s 1 or 2 ‘-’ before. In short, only happiness for the end user.
For more info: http://perldoc.perl.org/Getopt/Long.html
Term::ANSIColor
This is to use colors. Personally, I use it to display OK in green and Failed in red. A little example:
|
|
This should give you an idea of how colors work.
Term::ReadKey
Here’s a simple but useful thing, if we don’t want passwords to be displayed when someone types something on STDIN:
|
|
Creating a Module
Creating a module can be convenient to separate parts of your code. It is preferable, most of the time, to only do functions for modules and call them only from the main code.
To create a module, it’s simple. Imagine I have 2 scripts:
- main.pl (my main script)
- my_module.pm (my module)
So the module must have an extension in “.pm”. Then, it must contain this at the beginning:
|
|
And for it to be valid, the end of the module must end like this:
|
|
That’s it :-)
Creating a Binary
I don’t really like this policy but it can be very interesting to create binaries from perl source code. To do this, just use the perlcc command:
|
|
And there you go :-), it’s very simple and your code is no longer disclosed.
Creating an exe Under Windows
There are several commercial tools to make exes. For my part, I chose to use, once again, free and free. First, you need to install activeperl or Strawberry to be able to run Perl under Windows (cygwin might also do the trick). Personally, I have a slight preference for Strawberry because it’s very similar to Linux and it’s completely free.
With Strawberry
Open the Perl Command Line, run cpan and install the following modules:
|
|
With ActivePerl
Now, use Perl Package Manager to install the following packages as it will allow us to install all the necessary dependencies (in View, click on “All packages” to see all available packages):
We’ll also take advantage to install these packages:
- PAR
- MinGW
- Getopt-ArgvFile
- Module-ScanDeps
- Parse-Binary
- Win32-Exe
- PAR-Packer
If the PAR-Packer package is not available for your version, you’ll have to get it from an external site PAR-Packer. To install it, make sure you have version 5.10 of Perl, otherwise adapt with the right file, and run this command under Windows:
|
|
Or else, add the repository in PPM preferences:
- Name: A repository of Bioperl packages
- Location: http://bioperl.org/DIST
Generate the exe
Now we can generate .exe from a Perl script very easily:
|
|
So here I compiled a test.pl file into test.exe. Simple, right? :-)
If you want to add an icon add your icon at the end like this:
|
|
If when launching, you encounter a problem of dependency libraries, you must specify a path during compilation to tell it where to find them (–lib):
|
|
Memos
Here are some small memos that I use quite often.
clear
To do the equivalent of a clear (clear the screen), here’s the solution in perl:
|
|
If you want to clear the current line to for example make a countdown:
|
|
Display the Paths of Your Perl Libraries
To display all available paths for libraries, we’ll use this command:
|
|
Getting the PID of Your Program
You can get your Perl program’s PID very simply:
|
|
Resources
- Official Perl Site
- Perl Modules and Documentation
- To validate your code and claim to have “Clean” code
- Les Mongueurs de Perl
- Perlport
- O’Reilly’s Introduction to Perl (I highly recommend it)
- Another documentation on Introduction to Perl
- New Features of Perl 5.10 Part 1
- New Features of Perl 5.10 Part 2
- Documentation to Create a Perl Module
Last updated 24 Sep 2013, 11:20 CEST.