Wednesday, August 5, 2009

Learn perl easy part2

Variables

A variable is a symbolic placeholder for a value, a lot like the variables in algebra. Perl has several built-in variable types:

Scalars: $variable_name A single-valued variable, always preceded by a $ sign.

Arrays: @array_name A multi-valued variable indexed by integer, preceded by an @ sign.

Hashes: %hash_name A multi-valued variable indexed by string, preceded by a % sign.

Filehandle: FILEHANDLE_NAME A file to read and/or write from. Filehandles have no special prefix, but are usually written in all uppercase. We discuss arrays, hashes and filehandles later.

Scalar Variables

Scalar variables have names beginning with $. The name must begin with a letter or underscore, and can contain as many letters, numbers or underscores as you like. These are all valid scalars:

  • $foo
  • $The_Big_Bad_Wolf
  • $R2D2
  • $_____A23
  • $Once_Upon_a_Midnight_Dreary_While_I_Pondered_Weak_and_Weary

You assign values to a scalar variable using the = operator (not to be confused with ==, which is numeric comparison). You read from scalar variables by using them wherever a value would go.

A scalar variable can contain strings, floating point numbers, integers, and more esoteric things. You don't have to predeclare scalars. A scalar that once held a string can be reused to hold a number, and vice-versa:

Code:


$p = 'Potato'; # $p now holds the string "potato"
$bushels = 3; # $bushels holds the value 3
$potatoes_per_bushel = 80; # $potatoes_per_bushel contains 80;

$total_potatoes = $bushels * $potatoes_per_bushel; # 240

print "I have $total_potatoes $p\n";

Output:

I have 240 Potato

Scalar Variable String Interpolation

The example above shows one of the interesting features of double-quoted strings. If you place a scalar variable inside a double quoted string, it will be interpolated into the string. With a single-quoted string, no interpolation occurs.

To prevent interpolation, place a backslash in front of the variable:


print "I have \$total_potatoes \$p\n";

# prints: I have $total_potatoes $p

Operations on Scalar Variables

You can use a scalar in any string or numeric expression like $hypotenuse = sqrt($x**2 + $y**2) or $name = $first_name . ' ' . $last_name. There are also numerous shortcuts that combine an operation with an assignment:

$a++ Increment $a by one

$a-- Decrement $a by one

$a += $b Modify $a by adding $b to it.

$a -= $b Modify $a by subtracting $b from it.

$a *= $b Modify $a by multiplying $b to it.

$a /= $b Modify $a by dividing it by $b.

$a .= $b Modify the string in $a by appending $b to it.

Example Code:

 $potatoes_per_bushel = 80; # $potatoes_per_bushel contains 80;

$p = 'one';
$p .= ' '; # append a space
$p .= 'potato'; # append "potato"

$bushels = 3;
$bushels *= $potatoes_per_bushel; # multiply

print "From $p come $bushels.\n";

Output:

From one potato come 240.

Preincrement vs postincrement

The increment (++) operator can be placed before or after the variable name, and in either case, the effect on the variable is to bump it up by one. However, when you put the operator before the variable name, the value of the expression as a whole is the value of the variable after the operation (preincrement). If you put the operator after the variable name, the value of the expression is the value of the variable before it was incremented:

 $potatoes = 80; # $potatoes holds 80

$onions = ++$potatoes; # $onions holds 81, $potatoes holds 81

$parsnips = $potatoes++; # parsnips holds 81, $potatoes holds 82

The decrement (--) operator works the same way.

Weird Perl Assignment Idioms

Modify a Value and Save the Original in One Operation

 $potatoes = 80; # $potatoes holds 80
($onions = $potatoes) += 10;

# $onions now 90, and $potatoes still 80

$sequence = 'GAGTCTTTTGGG';
($reversec = reverse $sequence) =~ tr/GATC/CTAG/;
# reverse reverses a string
# tr/// translates one set of characters into another

# $sequence holds 'GAGTCTTTTGGG'
# $reversec holds 'CCCAAAAGACTC'

Swap the Values of Two Variables

Here's a simple way to swap the values of two variables in one fast step:

 ($onions,$potatoes) = ($potatoes,$onions);

# $onions now holds the original value of $potatoes, and vice-versa

Rotate the Values of Three Variables

 ($onions,$potatoes,$turnips) = ($potatoes,$turnips,$onions);

# $onions <- $potatoes # $potatoes <- $turnips # $turnips <- $onions

Processing Command Line Arguments

When a Perl script is run, its command-line arguments (if any) are stored in an automatic array called @ARGV. You'll learn how to manipulate this array later. For now, just know that you can call the shift function repeatedly from the main part of the script to retrieve the command line arguments one by one.

Printing the Command Line Argument

Code:


#!/usr/bin/perl
# file: echo.pl

$argument = shift;
print "The first argument was $argument.\n";

Output:

(~) 50% chmod +x echo.pl
(~) 51% echo.pl tuna
The first argument was tuna.
(~) 52% echo.pl tuna fish
The first argument was tuna.
(~) 53% echo.pl 'tuna fish'
The first argument was tuna fish.
(~) 53% echo.pl
The first argument was .

Computing the Hypotenuse of a Right Triangle

Code:


#!/usr/bin/perl
# file: hypotenuse.pl

$x = shift;
$y = shift;
$x>0 and $y>0 or die "Must provide two positive numbers";

print "Hypotenuse=",sqrt($x**2+$y**2),"\n";

Output:

(~) 82% hypotenuse.pl
Must provide two positive numbers at hypotenuse.pl line 6.
(~) 83% hypotenuse.pl 1
Must provide two positive numbers at hypotenuse.pl line 6.
(~) 84% hypotenuse.pl 3 4
Hypotenuse=5
(~) 85% hypotenuse.pl 20 18
Hypotenuse=26.9072480941474
(~) 86% hypotenuse.pl -20 18
Must provide two positive numbers at hypotenuse.pl line 6.

Basic I/O

I/O means "Input/Output". It's how your program communicates with the world.

Output

The print() function does it all:

Code:


#!/usr/bin/perl
# file: print.pl

$sidekick = 100;
print "Maxwell Smart's sidekick is ",$sidekick-1,".\n";
print "If she had a twin, her twin might be called ",2*($sidekick-1),".\n";

Output:

(~) 50% chmod +x print.pl
(~) 51% print.pl
Maxwell Smart's sidekick is 99.
If she had a twin, her twin might be called 198.

We will learn later how to print to a file rather than the terminal.

Input

The <> operator does input. It reads a line of input from the terminal. At the point that <> appears, the script will stop and wait for the user to type of line of input. Then <> will copy the input line into a variable.


#!/usr/bin/perl
# file: dog_years.pl

print "Enter your age: ";
$age = <>;
print "Your age in dog years is ",$age/7,"\n";

Output:

(~) 50% dog_years.pl
Enter your age: 42
Your age in dog years is 6

We will learn later how to take input from a file rather than the terminal.

The chomp() Function

When <> reads a line of input, the newline character at the end is included. Because of this, the program below doesn't do exactly what you expect:


#!/usr/bin/perl
print "Enter your name: ";
$name = <>;
print "Hello $name, happy to meet you!\n";

Output:

 % hello.pl
Enter your name: Lincoln
Hello Lincoln
, happy to meet you!

If you want to get rid of the newline there, you can chomp() it off. chomp() will remove the terminal newline, if there is one, and do nothing if there isn't.

This program works right:


#!/usr/bin/perl
print "Enter your name: ";
$name = <>;
chomp $name;
print "Hello $name, happy to meet you!\n";

Output:

 % hello.pl
Enter your name: Lincoln
Hello Lincoln, happy to meet you!

Numeric Comparisons


$a = 4 == 4; # TRUE $a = 4 == 2 + 2; # TRUE $a = 4 == $b; # depends on what $b is

$a = 4 != 4; # FALSE $a = 4 != 2 + 2; # FALSE $a = 4 != $b; # depends on what $b is

$a = 4 > 3; # TRUE $a = 4 < a =" 4"> $b; # depends on what $b is

$a = 4 >= 3; # TRUE $a = 4 >= 4; # TRUE $a = 4 <= $b; # depends on what $b is $result = $a <=> $b

$result is

  • -1 if the left side is less than the right side
  • 0 if the left side equals the right side
  • +1 if the left side is greater than the right side
NB: <=> is really useful in conjunction with the sort() function.

String Comparisons

$a = 'fred' eq 'fred'; # TRUE $a = 'fred and lucy' eq 'fred' . ' and ' . 'lucy'; # TRUE $a = 'fred' eq $b; # depends on what $b is

== is for numeric comparison. eq is for string comparison.


$a = 'fred' == 'lucy'; # WRONG WRONG WRONG!
$a = 'fred' ne 'fred'; # FALSE $a = 'fred' ne 'lucy'; # TRUE $a = 'fred' eq $b; # depends on what $b is

Use gt, lt, ge, ne for "Greater than", "Less than", "Greater or Equal" etc.

$a ='fred' gt 'lucy'; # FALSE $a ='fred' lt 'lucy'; # TRUE    $a ='Lucy' lt 'lucy'; # TRUE    $a ='Lucy' lt 'fred'; # TRUE !!

Use cmp to Compare Two Strings

$result = $a cmp $b

$result is

  • -1 if the left side is less than the right side
  • 0 if the left side equals the right side
  • +1 if the left side is greater than the right side
NB: cmp is really useful in the sort() function.

If-Else Statements

Use else blocks for either/or constructions.



if ($a == $b) {
print "a equals b\n";
$a += $b;
} else {
print "a does not equal b\n";
die "Operation aborted!";
}

You can string tests together using elsif:

if ($a > 100) {
die "a is too large\n";
} elsif ($a <>

Logical Operators

To combine comparisons, use the and, or and not logical operators. In some scripts, you might see their cryptic cousins, &&, || and !:

Lower precedence Higher precedence Description
$a and $b $a && $b TRUE if $a AND $b are TRUE
$a or $b $a || $b TRUE if either $a OR $b are TRUE
not $a !$b TRUE if $a is FALSE

if ($a <> 0) {
print "a is the right size\n";
} else {
die "out of bounds error, operation aborted!";
}

if ($a <> 0) {
print "a is the right size\n";
} else {
die "out of bounds error, operation aborted!";
}

if ($a >= 100 or $a <= 0) { die "out of bounds error, operation aborted!"; } if ($a >= 100 || $a <= 0) { die "out of bounds error, operation aborted!"; }

To Reverse Truth, use not or !


$ok = ($a <> 0);
print "a is too small\n" if not $ok;

# same as this:
print "a is too small\n" unless $ok;

# and this:
print "a is too small\n" if !$ok;

and vs &&, or vs ||

&& has higher precedence than and. || has higher precedence than or. This is an issue in assignments:

Low precedence operation:


$ok = $a <> 0;
# This doesn't mean:
$ok = ($a <> 0);

# but:
($ok = $a <> 0;


High precedence operation:

 $ok = $a <> 0;
# This does mean
$ok = ($a <> 0);

The "or die" Idiom

The or, and || operators short circuit. If what is on the left is true, then what is on the right is never evaluated, because it doesn't need to be.


$a = 10; $b = 99
$a <>

The die() Function Aborts Execution with an Error Message


die "\$a is the wrong size" unless ($a <> 0);

You Combine them Idiomatically Like This

($a <> 0) or die "\$a is the wrong size";


You can use "and" in the Same Way

If what is on the left of the "and" is FALSE, then Perl doesn't evaluate what's on the right, because it doesn't need to.



$a <>

File Tests

A bunch of operators are used to check whether files exist, directories exist, files are readable, etc.

-e file exists

-r file is readable

-x file is executable

-w file is writable

-d filename is a directory
 -w "./fasta.out" or die "Can't write to file";

print "This file is executable\n" if -x "/usr/bin/perl";


Simple Pattern Matches

To test whether a variable matches a string, use the =~ operator:

$a = 'gatttccaa';
print "contains three t's" if $a =~ /ttt/;
print "contains an EcoRI site" if $a =~ /gaattc/

Some Simple Regular Expression Components

Some symbols between the // are special:

^ Matches the beginning of the string.

$ Matches the end of the string.

\w Matches any single word character (e.g. a-z, A-Z, 0-9).

\w+ Matches one or more word characters.

\d Matches a single digit.

\d+ Matches one or more digits.
$a = '367-8380';
print "This is an OK telephone number.\n" if $a =~ /^\d\d\d-\d\d\d\d$/;


What is False?

The number 0, the string "0", the empty string, the empty list and undefined are all False.


Distinguishing Between the Empty String and 0


$a = '';
$b = 0;

$result = $a eq ''; # TRUE
$result = $b eq ''; # FALSE
$result = length $a > 0; # FALSE

Distinguishing Between the Empty String and undef


$a = undef;
$b = '';

$result = defined $a; # FALSE
$result = defined $b; # TRUE

strict and -w

Because you don't have to predeclare variables in Perl, there is a big problem with typos:

$value = 42;
print "Value is OK\n" if $valu <>

The -w Switch Will Warn of Uninitialized Variables

#!/usr/bin/perl -w

$value = 42;
print "Value is OK\n" if $valu <>

% perl uninit.pl
Name "main::valu" used only once: possible typo at uninit.pl line 4.
Name "main::value" used only once: possible typo at uninit.pl line 3.
Use of uninitialized value in numeric gt (>) at uninit.pl line 4.

"use strict"

The "use strict" pragma forces you to predeclare all variables using "my":

#!/usr/bin/perl -w

use strict;
$value = 42;
print "Value is OK\n" if $valu <>

% perl uninit.pl
Global symbol "$value" requires explicit package name at uninit.pl line 4.
Global symbol "$valu" requires explicit package name at uninit.pl line 5.
Execution of uninit.pl aborted due to compilation errors.
#!/usr/bin/perl -w

use strict;
$value = 42;
print "Value is OK\n" if $value <>

% perl uninit.pl
Global symbol "$value" requires explicit package name at uninit.pl line 4.
Execution of uninit.pl aborted due to compilation errors.

#!/usr/bin/perl -w

use strict;
my $value = 42;
print "Value is OK\n" if $value <>

% perl uninit.pl
Value is OK

Using my

You can use "my" on a single variable, or on a list of variables:
my $value = 42;
my $a;
my ($c,$d,$e,$f);
my ($first,$second) = (1,2);

No comments: