Thursday, August 20, 2009

Learn perl easy part5

perl 1 2 3 4
What is a Subroutine?

We have been using a form of subroutines all along. Perl functions are basically built in subroutines. You call them (or "invoke") a function by typing its name, and giving it one or more arguments.

Example: Length

my $seq = 'ATGCAAATGCCA';

my $seq_length = length $seq; ## OR
my $seq_length = length($seq);

# $seq_length now contains 12

Perl gives you the opportunity to define your own functions, called "subroutines". In the simplest sense, subroutines are named blocks of code that can be reused as many times as you wish.

Example: A very basic subroutine


sub Hello {
print "Hello World!!\n";
}

print "Sometimes I just want to shout ";
Hello(); #or &Hello;


Example: Some simple subroutines


sub hypotenuse {
my ($a,$b) = @_;
return sqrt($a**2 + $b**2);
}
sub E {
return 2.71828182845905;
}
#########

$y = 3;
$x = hypotenuse($y,4);
# $x now contains 5

$x = hypotenuse((3*$y),12);
# $x now contains 15

$value_e = E();
# $value_e now contains 2.71828182845905

This way of using subroutines makes them look suspiciously like functions. Note: Unlike a function, you must use parentheses when calling a subroutine in this manner, even if you are giving it no arguments.
The Magic Array - @_

Perhaps the most important concept to understand is that values are passed to the subroutine in the default array @_. This array springs magically into existence, and contains the list of values that you gave to subroutine (within the parentheses).

Example: The magic of @_


sub Add_two_numbers {
my ($number1) = shift; # get first argument from @_ and put it in $number1
my ($number2) = shift; # get second argument from @_ and put it in $number2

my $sum = $number1 + $number2;
return $sum;
}

sub Add_two_numbers_2 {
my ($number1,$number2) = @_;
my $sum = $number1 + $number2;
return $sum;
}

sub Add_two_numbers_arcane {
return ($_[0] + $_[1]);
}

Some Subroutine Notes

* Use a name for your subroutine that makes sense to you. Avoid using names that Perl already uses (like "length" or "print"), unless you really like making yourself miserable.
* If you don't give a return statement, the subroutine will return the last value calculated.
* You may have multiple return statements. The first one that is executed will exit the subroutine

Example: A more complex subroutine with different returns


sub Number_Examiner {
my $number = shift;

unless ($number =~ /^\d+$/){
return "You sure this is a number?";
}
if ($number >= 100){
return "Big Number!";
}
elsif ($number > 50){
return "Bigger than 50!;
}
else {
return "Wee Little Number";
}
}

* You can return either a single value or a list of values. You can, if you wish, return nothing. Remember to use your subroutine in a way that reflects the number of values you expect to get back.

Example: Know what you expect


my ($value1,$value2,$value3) = ReturnThreeValues();
# if you are expecting three values back, make space for them.
my (@values) = ReturnThreeValues(); # another way to do it

my ($value1,$value2) = ReturnThreeValues();
# the last value is lost, gone, vanished, DOA... You may have
wanted to do this.

"my" Variables

Variables that you use in a subroutine should be made private to that subroutine with the my operator. This avoids accidentally overwriting similarly-named variables in the main program. If you already included use strict at the top of your program, perl will check that all variables are introduced with my.

Why Use My?



my $var = "Boo!";
Scary();
print "$var\n";

sub Scary{
print "$var\n";
$var = "Eeek!";
}

# The results:
Boo!
Eeek!


Variables made private with my only exist within a block (curly braces). The subroutine body is a block, so the my variables only exist within the body of the subroutine.

You can make scalars, arrays and hashes private. If you apply my() to a list, it makes each member of the list private.

{ # start a block
my $scalar; # $scalar is private
my @array; # now @array is private
my %hash; # %hash is private

# same thing, but in one swell foop
my ($scalar,@array,%hash);
}


lectuer 7:

1. Using a Module
2. Getting Module Documentation
3. Installing Modules
4. More About Importing
5. Where are Modules Installed?
6. The Anatomy of a Module
7. Exporting Variables & Functions from Modules
8. Using Object-Oriented Modules


Using a Module

A module is a package of useful subroutines and variables that someone has put together. Modules extend the ability of Perl.
Example 1: The File::Basename Module

The File::Basename module is a standard module that is distributed with Perl. When you load the File::Basename module, you get two new functions, basename and dirname.

basename takes a long UNIX path name and returns the file name at the end. dirname takes a long UNIX path name and returns the directory part.


#!/usr/bin/perl
# file: basename.pl

use strict;
use File::Basename;

my $path = '/bush_home/bush1/lstein/C1829.fa';
my $base = basename($path);
my $dir = dirname($path);

print "The base is $base and the directory is $dir.\n";

The output of this program is:

The base is C1829.fa and the directory is /bush_home/bush1/lstein.

The use function loads up the module named File::Basename and imports the two functions. If you didn't use use, then the program would print an error:

Undefined subroutine &main::basename called at basename.pl line 8.

Example 2: The Env Module

The Env module is a standard module that provides access to the environment variables. When you load it, it imports a set of scalar variables corresponding to your environment.

#!/usr/bin/perl
# file env.pl

use strict;
use Env;

print "My home is $HOME\n";
print "My path is $PATH\n";
print "My username is $USER\n";

When this runs, the output is:

My home is /bush_home/bush1/lstein
My path is /net/bin:/usr/bin:/bin:/usr/local/bin:/usr/X11R6/bin:/bush_home/bush1/lstein/bin:.
My username is lstein

Controlling What Gets Imported

Each module will automatically import a different set of variables and subroutines when you use it. You can control what gets imported by providing use with a list of what to import.

By default the Env module will import all the environment variables. You can make it import only some:

#!/usr/bin/perl
# file env2.pl

use strict;
use Env '$HOME','$PATH';

print "My home is $HOME\n";
print "My path is $PATH\n";
print "My username is $USER\n";

Global symbol "$USER" requires explicit package name at env2.pl line 9.
Execution of env2.pl aborted due to compilation errors.

You can import scalars, hashes, arrays and functions by giving a list of strings containing the variable or function names. This line imports a scalar named $PATH, an array named @PATH, and a function named printenv.

#!/usr/bin/perl

use Env '$PATH','@PATH','printenv';

print join "\n",@PATH;

Output:

/net/bin
/usr/bin
/bin
/usr/local/bin
/usr/X11R6/bin
/bush_home/bush1/lstein/bin
.

You will often see the qw() operator used to reduce typing:

use TestModule qw($PATH $HOME @PATH printenv);


Finding out What Modules are Installed

Here are some tricks for finding out what Modules are installed.
Preinstalled Modules

To find out what modules come with perl, look in Appendix A of Perl 5 Pocket Reference. From the command line, use the perldoc command from the UNIX shell. All the Perl documentation is available with this command:

% perldoc perlmodlib
PERLMODLIB(1) User Contributed Perl Documentation PERLMODLIB(1)

NAME
perlmodlib - constructing new Perl modules and finding
existing ones

DESCRIPTION
THE PERL MODULE LIBRARY
Many modules are included the Perl distribution. These
are described below, and all end in .pm. You may discover
...
Standard Modules

Standard, bundled modules are all expected to behave in a
well-defined manner with respect to namespace pollution
because they use the Exporter module. See their own docu-
mentation for details.

AnyDBM_File Provide framework for multiple DBMs

AutoLoader Load subroutines only on demand

AutoSplit Split a package for autoloading

B The Perl Compiler
...

To learn more about a module, run perldoc with the module's name:

% perldoc File::Basename

NAME
fileparse - split a pathname into pieces

basename - extract just the filename from a path

dirname - extract just the directory from a path

SYNOPSIS
use File::Basename;

($name,$path,$suffix) = fileparse($fullname,@suffixlist)
fileparse_set_fstype($os_string);
$basename = basename($fullname,@suffixlist);
$dirname = dirname($fullname);
...

Optional Modules that You May Have Installed

perldoc perllocal will list the names of locally installed modules.

% perldoc perllocal
Thu Apr 27 16:01:31 2000: "Module" the DBI manpage

o "installed into: /usr/lib/perl5/site_perl"

o "LINKTYPE: dynamic"

o "VERSION: 1.13"

o "EXE_FILES: dbish dbiproxy"

Thu Apr 27 16:01:41 2000: "Module" the Data::ShowTable
manpage

o "installed into: /usr/lib/perl5/site_perl"

o "LINKTYPE: dynamic"

o "VERSION: 3.3"

o "EXE_FILES: showtable"

Tue May 16 18:26:27 2000: "Module" the Image::Magick man-
page
...


Installing Modules

You can find thousands of Perl Modules on CPAN, the Comprehensive Perl Archive Network:

http://www.cpan.org

Installing Modules Manually

Search for the module on CPAN using the keyword search. When you find it, download the .tar.gz module. Then install it like this:

% tar zxvf bioperl-0.7.1.tar.gz
bioperl-0.7.1/
bioperl-0.7.1/Bio/
bioperl-0.7.1/Bio/DB/
bioperl-0.7.1/Bio/DB/Ace.pm
bioperl-0.7.1/Bio/DB/GDB.pm
bioperl-0.7.1/Bio/DB/GenBank.pm
bioperl-0.7.1/Bio/DB/GenPept.pm
bioperl-0.7.1/Bio/DB/NCBIHelper.pm
bioperl-0.7.1/Bio/DB/RandomAccessI.pm
bioperl-0.7.1/Bio/DB/SeqI.pm
bioperl-0.7.1/Bio/DB/SwissProt.pm
bioperl-0.7.1/Bio/DB/UpdateableSeqI.pm
bioperl-0.7.1/Bio/DB/WebDBSeqI.pm
bioperl-0.7.1/Bio/AlignIO.pm

% perl Makefile.PL
Generated sub tests. go make show_tests to see available subtests
...
Writing Makefile for Bio

% make
cp Bio/Tools/Genscan.pm blib/lib/Bio/Tools/Genscan.pm
cp Bio/Root/Err.pm blib/lib/Bio/Root/Err.pm
cp Bio/Annotation/Reference.pm blib/lib/Bio/Annotation/Reference.pm
cp bioback.pod blib/lib/bioback.pod
cp Bio/AlignIO/fasta.pm blib/lib/Bio/AlignIO/fasta.pm
cp Bio/Location/NarrowestCoordPolicy.pm blib/lib/Bio/Location/NarrowestCoordPolicy.pm
cp Bio/AlignIO/clustalw.pm blib/lib/Bio/AlignIO/clustalw.pm
cp Bio/Tools/Blast/Run/postclient.pl blib/lib/Bio/Tools/Blast/Run/postclient.pl
cp Bio/LiveSeq/Intron.pm blib/lib/Bio/LiveSeq/Intron.pm
...
Manifying blib/man3/Bio::LiveSeq::Exon.3
Manifying blib/man3/Bio::Location::CoordinatePolicyI.3
Manifying blib/man3/Bio::SeqFeature::Similarity.3

% make test
PERL_DL_NONLAZY=1 /net/bin/perl -Iblib/arch -Iblib/lib -I/net/lib/perl5/5.6.1/i686-linux -I/net/lib/perl5/5.6.1 -e 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests @ARGV;' t/*.t
t/AAChange..........ok
t/AAReverseMutate...ok
t/AlignIO...........ok
t/Allele............ok
...
t/WWW...............ok
All tests successful, 95 subtests skipped.
Files=60, Tests=1011, 35 wallclock secs (25.47 cusr + 1.60 csys = 27.07 CPU)

% make install
Installing /net/lib/perl5/site_perl/5.6.1/bioback.pod
Installing /net/lib/perl5/site_perl/5.6.1/biostart.pod
Installing /net/lib/perl5/site_perl/5.6.1/biodesign.pod
Installing /net/lib/perl5/site_perl/5.6.1/bptutorial.pl
...

If you have an older version of the tar program, you may need to replace the first step with this:

% gunzip -c bioperl-0.7.1.tar.gz | tar xvf -

Installing Modules Using the CPAN Shell

Perl has a CPAN module installer built into it. You run it like this:

% perl -MCPAN -e shell

cpan shell -- CPAN exploration and modules installation (v1.59_54)
ReadLine support enabled

cpan>

From this shell, there are commands for searching for modules, downloading them, and installing them.

[The first time you run the CPAN shell, it will ask you a lot of configuration questions. Generally, you can just hit return to accept the defaults. The only trick comes when it asks you to select CPAN mirrors to download from. Choose any ones that are in your general area on the Internet and it will work fine.]

Here is an example of searching for the Text::Wrap program and installing it:

cpan> i /Wrap/
Going to read /bush_home/bush1/lstein/.cpan/sources/authors/01mailrc.txt.gz
CPAN: Compress::Zlib loaded ok
Going to read /bush_home/bush1/lstein/.cpan/sources/modules/02packages.details.txt.gz
Database was generated on Tue, 16 Oct 2001 22:32:59 GMT
CPAN: HTTP::Date loaded ok
Going to read /bush_home/bush1/lstein/.cpan/sources/modules/03modlist.data.gz
Distribution B/BI/BINKLEY/CGI-PrintWrapper-0.8.tar.gz
Distribution C/CH/CHARDIN/MailQuoteWrap0.01.tgz
Distribution C/CJ/CJM/Text-Wrapper-1.000.tar.gz
...
Module Text::NWrap (G/GA/GABOR/Text-Format0.52+NWrap0.11.tar.gz)
Module Text::Quickwrap (Contact Author Ivan Panchenko )
Module Text::Wrap (M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz)
Module Text::Wrap::Hyphenate (Contact Author Mark-Jason Dominus )
Module Text::WrapProp (J/JB/JBRIGGS/Text-WrapProp-0.03.tar.gz)
Module Text::Wrapper (C/CJ/CJM/Text-Wrapper-1.000.tar.gz)
Module XML::XSLT::Wrapper (M/MU/MULL/XML-XSLT-Wrapper-0.32.tar.gz)
41 items found

cpan> install Text::Wrap
Running install for module Text::Wrap
Running make for M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
ftp://archive.progeny.com/CPAN/authors/id/M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz
CPAN: MD5 loaded ok
Fetching with LWP:
ftp://archive.progeny.com/CPAN/authors/id/M/MU/MUIR/modules/CHECKSUMS
Checksum for /bush_home/bush1/lstein/.cpan/sources/authors/id/M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz ok
Scanning cache /bush_home/bush1/lstein/.cpan/build for sizes
Text-Tabs+Wrap-2001.0929/
Text-Tabs+Wrap-2001.0929/MANIFEST
Text-Tabs+Wrap-2001.0929/CHANGELOG
Text-Tabs+Wrap-2001.0929/Makefile.PL
Text-Tabs+Wrap-2001.0929/t/
Text-Tabs+Wrap-2001.0929/t/fill.t
Text-Tabs+Wrap-2001.0929/t/tabs.t
Text-Tabs+Wrap-2001.0929/t/wrap.t
Text-Tabs+Wrap-2001.0929/README
Text-Tabs+Wrap-2001.0929/lib/
Text-Tabs+Wrap-2001.0929/lib/Text/
Text-Tabs+Wrap-2001.0929/lib/Text/Wrap.pm
Text-Tabs+Wrap-2001.0929/lib/Text/Tabs.pm

CPAN.pm: Going to build M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz

Checking if your kit is complete...
Looks good
Writing Makefile for Text
cp lib/Text/Wrap.pm blib/lib/Text/Wrap.pm
cp lib/Text/Tabs.pm blib/lib/Text/Tabs.pm
Manifying blib/man3/Text::Wrap.3
Manifying blib/man3/Text::Tabs.3
/usr/bin/make -- OK
Running make test
PERL_DL_NONLAZY=1 /net/bin/perl -Iblib/arch -Iblib/lib -I/net/lib/perl5/5.6.1/i686-linux -I/net/lib/perl5/5.6.1 -e 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests @ARGV;' t/*.t
t/fill..............ok
t/tabs..............ok
t/wrap..............ok
All tests successful.
Files=3, Tests=37, 0 wallclock secs ( 0.20 cusr + 0.00 csys = 0.20 CPU)
/usr/bin/make test -- OK
Running make install
Installing /net/lib/perl5/5.6.1/Text/Wrap.pm
Installing /net/man/man3/Text::Wrap.3
Installing /net/man/man3/Text::Tabs.3
Writing /net/lib/perl5/5.6.1/i686-linux/auto/Text/.packlist
Appending installation info to /net/lib/perl5/5.6.1/i686-linux/perllocal.pod
/usr/bin/make install UNINST=1 -- OK

cpan> quit
Lockfile removed.


More About Importing

Recall that each module has a default list of functions and variables to import. Some modules import many functions by default, others import none. Most import some.

Modules that have a lot of functions and variables to import frequently put them into groups. Groups can be specified using the ":group" syntax.

For example, the CGI::Pretty module has a group called ":standard", which imports a bunch of standard functions for creating HTML pages.


#!/usr/bin/perl
# file: html.pl

use strict;
use CGI::Pretty qw(:standard);

print h1('This is a level one header');
print p('This is a paragraph.');
print p('Here is some',i('italicized'),'text.');

% html.pl


This is a level one header



This is a paragraph.



Here is some italicized text.




The module's documentation will tell you what function groups are defined. To import the default functions, plus optional ones, use the group ":DEFAULT".

use CGI::Pretty qw(:DEFAULT :standard start_html);



Where are Modules Installed?

Module files end with the extension .pm. If the module name is a simple one, like Env, then Perl will look for a file named Env.pm. If the module name is separated by :: sections, Perl will treat the :: characters like directories. So it will look for the module File::Basename in the file File/Basename.pm

Perl searches for module files in a set of directories specified by the Perl library path. This is set when Perl is first installed. You can find out what directories Perl will search for modules in by issuing perl -V from the command line:

% perl -V
Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration:
Platform:
osname=linux, osvers=2.4.2-2smp, archname=i686-linux
...
Compiled at Oct 11 2001 11:08:37
@INC:
/usr/lib/perl5/5.6.1/i686-linux
/usr/lib/perl5/5.6.1
/usr/lib/perl5/site_perl/5.6.1/i686-linux
/usr/lib/perl5/site_perl/5.6.1
/usr/lib/perl5/site_perl
.

You can modify this path to search in other locations by placing the use lib command somewhere at the top of your script:

#!/usr/bin/perl

use lib '/home/lstein/lib';
use MyModule;
...

This tells Perl to look in /home/lstein/lib for the module MyModule before it looks in the usual places. Now you can install module files in this directory and Perl will find them.

The Anatomy of a Module File

Here is a very simple module file named "MySequence.pm":

package MySequence;
#file: MySequence.pm

use strict;
our $EcoRI = 'ggatcc';

sub reversec {
my $sequence = shift;
$sequence = reverse $sequence;
$sequence =~ tr/gatcGATC/ctagCTAG/;
return $sequence;
}

sub seqlen {
my $sequence = shift;
$sequence =~ s/[^gatcnGATCN]//g;
return length $sequence;
}

1;

A module begins with the keyword package and ends with "1;". package gives the module a name, and the 1; is a true value that tells Perl that the module compiled completely without crashing.

The our keyword declares a variable to be global to the module. It is similar to my, but the variable can be shared with other programs and modules ("my" variables cannot be shared outside the current file, subroutine or block). This will let us use the variable in other programs that depend on this module.

To install this module, just put it in the Perl module path somewhere, or in the current directory.
Using the MySequence.pm Module

Using this module is very simple:

#!/usr/bin/perl
#file: sequence.pl

use strict;
use MySequence;

my $sequence = 'gattccggatttccaaagggttcccaatttggg';
my $complement = MySequence::reversec($sequence);

print "original = $sequence\n";
print "complement = $complement\n";

% sequence.pl
original = gattccggatttccaaagggttcccaatttggg
complement = cccaaattgggaaccctttggaaatccggaatc

Unless you explicitly export variables or functions, the calling function must explicitly qualify each MySequence function by using the notation:

MySequence::function_name

For a non-exported variable, the notation looks like this:

$MySequence::EcoRI


Exporting Variables and Functions from Modules

To make your module export variables and/or functions like a "real" module, use the Exporter module.

package MySequence;
#file: MySequence.pm

use strict;
use base 'Exporter';

our @EXPORT = qw(reversec seqlen);
our @EXPORT_OK = qw($EcoRI);

our $EcoRI = 'ggatcc';

sub reversec {
my $sequence = shift;
$sequence = reverse $sequence;
$sequence =~ tr/gatcGATC/ctagCTAG/;
return $sequence;
}

sub seqlen {
my $sequence = shift;
$sequence =~ s/[^gatcnGATCN]//g;
return length $sequence;
}

1;

The use base 'Exporter' line tells Perl that this module is a type of "Exporter" module. As we will see later, this is a way for modules to inherit properties from other modules. The Exporter module (standard in Perl) knows how to export variables and functions.

The our @EXPORT = qw(reversec seqlen) line tells Perl to export the functions reversec and seqlen automatically. The our @EXPORT_OK = qw($EcoRI) tells Perl that it is OK for the user to import the $EcoRI variable, but not to export it automatically.

The qw() notation is telling Perl to create a list separated by spaces. These lines are equivalent to the slightly uglier:

our @EXPORT = ('reversec','seqlen');

Using the Better MySequence.pm Module

Now the module exports its reversec and seqlen functions automatically:

#!/usr/bin/perl
#file: sequence2.pl

use strict;
use MySequence;

my $sequence = 'gattccggatttccaaagggttcccaatttggg';
my $complement = reversec($sequence);

print "original = $sequence\n";
print "complement = $complement\n";

The calling program can also get at the value of the $EcoRI variable, but he has to ask for it explicitly:

#!/usr/bin/perl
#file: sequence3.pl

use strict;
use MySequence qw(:DEFAULT $EcoRI);

my $sequence = 'gattccggatttccaaagggttcccaatttggg';
my $complement = reversec($sequence);

print "original = $sequence\n";
print "complement = $complement\n";

if ($complement =~ /$EcoRI/) {
print "Contains an EcoRI site.\n";
} else {
print "Doesn't contain an EcoRI site.\n";
}

Note that we must now import the :DEFAULT group in order to get the default reversec and seqlen functions.


Object-Oriented Modules

Some modules are object-oriented. Instead of importing a series of subroutines that are called directly, these modules define a series of object types that you can create and use. We will talk about object-oriented syntax in greater detail in the Perl References and Objects lecture. Here we will just show an example:
The Math::Complex Module

The Math::Complex module is a standard module that implements complex numbers. You work with it by creating one or more Math::Complex objects. You can then manipulate these objects mathematically by adding them, subtracting them, multiplying, and so on. Here is a brief example:

#!/usr/bin/perl
# file: complex.pl

use strict;
use Math::Complex;

my $a = Math::Complex->make(5,6);
my $b = Math::Complex->make(10,20);
my $c = $a * $b;

print "$a * $b = $c\n";

We load the Math::Complex module with use, but now instead of calling imported subroutines, we create two objects named $a and $b. Both are created by calling Math::Complex->make() with two arguments. The first argument is the real part of the complex number, and the second is the imaginary part. The return value from make() is the complex number object. We multiply the two numbers together and store the result in $c. Finally, we print out all three values. The script's output is:

51% perl complex.pl
5+6i * 10+20i = -70+160i

Object Syntax

The call to make() uses Perl's object-oriented syntax. Read it as meaning "invoke the make() subroutine that is located inside the Math::Complex package." The call is similar, but not quite equivalent, to this:

Math::Complex::make(10,20)

The difference is that the object-oriented syntax tells Perl to pass the name of the module as an implicit first argument to make(). Therefore, Math::Complex->make(10,20) is almost exactly equivalent to this:

Math::Complex::make('Math::Complex',10,20)

If you are using object-oriented modules, you will never have to worry about this extra argument. If you are writing object-oriented modules, the necessity for the extra argument will make sense to you.

No comments: