Tuesday, November 3, 2009

Installation and usage of fakeap tool

Installation and usage of fakeap tool:
Download fakeap-0.3.2-1.0.rh7.rf.noarch.rpm, hostapd-0.6.9.tar.gz.
Install bridge-utils-1.1-2 rpm.

Installing hostapd:-
Untar hostapd-0.6.9.tar.gz.
cd hostapd-0.6.9/hostapd/
copy the defconfig file with the name .config (hidden file).
cp defconfig .config

Uncommon CONFIG_DRIVER_MADWIFI=y and CFLAGS += -I../../madwifi
Change CFLAGS path to the path where madwifi source code exit. In my case it is CFL CFLAGS += -I/home/softwares/madwifi-0.9.4/
and make , make install.

Now for fakeap simply say 'rpm -ivh fakeap-0.3.2-1.0.rh7.rf.noarch.rpm.

Using fakeap tool:-
#rmmod ath_pci
#modprob ath_pci autocreate=ap
#ifconfig ath0 0.0.0.0 up
#ifconfig eth0 0.0.0.0 up
#brctl addbr br0
#brctl addif br0 ath0
#brctl addif br0 eth0
#wlanconfig ath create wlandev wifi0 wlanmode monitor
#ifconfig ath1 up
vi madwifi.conf add below lines
#--------------------------------------------------
# Configuration File for WPA-PSK
interface=ath0
bridge=br0
driver=madwifi
logger_syslog=-1
logger_syslog_level=2
logger_stdout=-1
logger_stdout_level=2
debug=0
dump_file=/tmp/hostapd.dump
ctrl_interface=/var/run/hostapd
ctrl_interface_group=0
ssid=test_ssid
macaddr_acl=0
auth_algs=3
wpa=3
wpa_passphrase=XXXXXXXXXXX
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP CCMP
#---------------------------------------------------

#./hostapd -B madwifi.conf
#perl /usr/bin/fakap
It will print the help.
ex usage:-
perl /usr/bin/fakeap --interface ath0 --words /usr/share/doc/fakeap-0.3.2/lists/stefan-wordlist.txt --vendors /usr/share/doc/fakeap-0.3.2/lists/stefan-maclist.txt --channel 6 -sleep 0.01
Using other pc open wireshark on wifi interface with monitor mode and check for beacon packets with different ssids and mac.

Monday, October 26, 2009

downloading linux iso using torrents.

In google search for you torrents iso link for source.
I have searched for ubuntu and downloaded "ubuntu-9.04-desktop-i386.iso.torrent" its around 24 KB.

If you don't have torrent downloader in linux
http://sourceforge.net/projects/azureus/files/vuze/vuze-4.2.0.8/Vuze_4.2.0.8_linux.tar.bz2/download
tar -jxvf Vuze_4.2.0.8_linux.tar.bz2
cd vuze/
./azureus &
It open vuze window
(This tool available for windows also)
It will open GUI for vuze.
Press ctrl+o or chose file -> open-> torrent file.
Click on "add file", browse and select torrent file which you already downloaded.
Now torrent starts downloading file.

Tuesday, October 13, 2009

configure Bugzilla with gmail smtp

This article explains how to use GMAIL SMTP(TLS AUTH) to send bugzilla alerts.
Bugzilla supports following methods to send mail alerts :
  • sendmail
  • SMTP
For SMTP method BugZilla uses Email::Send::SMTP Perl module. GMAIL SMTP uses TLS ( Transport Layer Security ) as authentication method, so Email::Send::SMTP can not be used for the same.
We need to use another perl module Email::Send::SMTP::TLS.
First step would be to install Email::Send::SMTP::TLS from CPAN. use following steps to do so :
  • using CPAN SHELL
  • Lanch CPAN shell as follows
#cpan
capn>install Email::Send::SMTP::TLS
cpan shell might ask you to install additional modules, install them.

  • compiling from source
  • Download source from HERE and execute following commands
    1.tar zxvf Email-Send-SMTP-TLS-0.03.tar.gz
    2.cd Email-Send-SMTP-TLS-0.03
    3../configure
    4.make
    5.make install
Once Email:Send::SMTP::TLS is installed, BugZilla should have SMTP::TLS method to send alerts. This can be verified by logging in as admin to BugZilla and going to Administration -> Parameters -> Email
Now we need tweak BugZilla code. carefully execute following instructions and don't forget to backup files before you modify.
Go to your BugZilla installation directory and execute following command

1.
cd Bugzilla
Open Mailer.pm file.
Search for following code( should be on line # 57)
1.sub MessageToMTA {
add following line after it

1.
my ($smtp_server,$smtp_port);

search for following if block

1.if ($method eq "SMTP") {
and change it to
01.if ($method eq "SMTP" || $method eq "SMTP::TLS") {
02. ($smtp_server,$smtp_port) = split /:/,Bugzilla->params->{"smtpserver"};
03. push @args,
04. Host => $smtp_server,
05. User => Bugzilla->params->{"smtp_username"},
06. Password => Bugzilla->params->{"smtp_password"},
07. Hello => $hostname,
08. Debug => Bugzilla->params->{'smtp_debug'};
09. push @args, Port => $smtp_port if($smtp_port);
10. }
Now we need to do settings in bugzilla parameters.
Login as administrator to Bugzilla and go to Administration -> Parameters -> Email and do following settings
1. Select SMTP::TLS as mail_delivery_method
2. Enter your gmail address in mailfrom
3. Enter smtp.gmail.com:587 in smtpserver
4. Enter you@gmail.com in smtp_username
5. Enter gmail password in smtp_password

It will work pretty good.




Wednesday, October 7, 2009

Install vpnclient on centos 5.1 and perforce

INSTALL VPNCLIENT
download vpnclient-linux-x86_64-4.8.02.0030-k9.tar.gz file
tar -zxvf vpnclient-linux-x86_64-4.8.02.0030-k9.tar.gz
cd vpnclient
./vpn_install
/etc/init.d/vpnclient_init start
copy cust.pcf (your vpn profile file to connect to your client) to /etc/opt/cisco-vpnclient/Profiles/.
To connect to your customer type
#vpnclient connect cust
Above cust is the name of the pcf file you copied to /etc/opt/cisco-vpnclient/Profiles/.


INSTALL PERFORCE

download p4v from http://www.perforce.com/perforce/downloads/index.html
tar -zxvf p4v.tgz
cd p4v-2009.1.212209
directly you can access perforce here.

Wednesday, September 9, 2009

network tools for wireless and wired

Security tube contains group of tools characterized by the name "Tutorials on commonly used Security Tools" for wireless and wired networks. It provided video explanation for each tool. So what are you waiting for.............

Other than this you can find some other tools which are available. have look...

Tuesday, September 8, 2009

how to install madwifi driver for linux with and with out resend packets option.

Madwifi driver is basically written for Atheros chipsets and it supports most
It is a open source but depends on the proprietary Hardware Abstraction Layer (HAL).

Download and Check

$ wget http://ufpr.dl.sourceforge.net/sourceforge/madwifi/madwifi-0.9.4.tar.bz2
$ tar -zxvf madwifi-0.9.4.tar.bz2
$ cd madwifi-0.9.4/
$ cd scripts/
$ ./madwifi-unload.bash
$ ./find-madwifi-modules.sh $(uname -r)
$ cd ..

Build and comment

$ make

Checking requirements… ok.
Checking kernel configuration… ok.
make -C /lib/modules/2.6.18-92.1.13.el5/build SUBDIRS=/home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4 modules
make[1]: Entering directory `/usr/src/kernels/2.6.18-92.1.13.el5-i686′
CC [M] /home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4/ath/if_ath.o
In file included from :1:
/home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4/ath/../include/compat.h:140: error: redefinition of ’skb_end_pointer’
….
….
….
make[3]: *** [/home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4/ath/if_ath.o] Error 1
make[2]: *** [/home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4/ath Error 2
make[1]: *** [_module_/home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.18-92.1.13.el5-i686′
make: *** [modules] Error 2

$ cd include
$ cp compat.h compat.h.old
$ vim compat.h
$ diff -U 3 -dHrN -- compat.h compat.h.old
$cat compat.h.diff



— compat.h 2009-03-19 02:02:49.000000000 -0400
+++ compat.h.old 2009-03-18 19:09:37.000000000 -0400
@@ -134,7 +134,7 @@
#define IRQF_SHARED SA_SHIRQ
#endif
-/* #if LINUX_VERSION_CODE <>mac.raw = skb->data;
}
-#endif */
+#endif
#if LINUX_VERSION_CODE <>$ cd ../
$ make

Checking requirements… ok.
Checking kernel configuration… ok.
make -C /lib/modules/2.6.18-92.1.13.el5/build SUBDIRS=/home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4 modules
make[1]: Entering directory `/usr/src/kernels/2.6.18-92.1.13.el5-i686′
CC [M] /home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4/ath/if_ath.o



make[1]: Entering directory `/home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4/tools’
gcc -o athstats -g -O2 -Wall -I. -I../hal -I.. -I../ath athstats.c
gcc -o 80211stats -g -O2 -Wall -I. -I../hal -I.. 80211stats.c
gcc -o athkey -g -O2 -Wall -I. -I../hal -I.. athkey.c
gcc -o athchans -g -O2 -Wall -I. -I../hal -I.. athchans.c
gcc -o athctrl -g -O2 -Wall -I. -I../hal -I.. athctrl.c
gcc -o athdebug -g -O2 -Wall -I. -I../hal -I.. athdebug.c
gcc -o 80211debug -g -O2 -Wall -I. -I../hal -I.. 80211debug.c
gcc -o wlanconfig -g -O2 -Wall -I. -I../hal -I.. wlanconfig.c
gcc -o ath_info -g -O2 -Wall ath_info.c
make[1]: Leaving directory `/home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4/tools’

# make install

sh scripts/find-madwifi-modules.sh 2.6.18-92.1.13.el5
for i in ath/ ath_hal/ ath_rate/ net80211/; do \
make -C $i install || exit 1; \
done



install -d /usr/local/man/man8
install -m 0644 man/*.8 /usr/local/man/man8
make[1]: Leaving directory `/home/rafa/0_Down/1_Source/WireLess/madwifi-0.9.4/tools’

Configuration and re-boot

# modprob ath_pci
# iwconfig

lo no wireless extensions.

eth1 no wireless extensions.

eth2 no wireless extensions.

sit0 no wireless extensions.

wifi0 no wireless extensions.

ath0 IEEE 802.11b ESSID:”"

Mode:Managed Channel:0 Access Point: Not-Associated
Bit Rate:0 kb/s Tx-Power:0 dBm Sensitivity=1/1
Retry:off RTS thr:off Fragment thr:off
Encryption key:off
Power Management:off
Link Quality=0/70 Signal level=-256 dBm Noise level=-256 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0

# ifconfig ath0 up
# wlanconfig ath0 list scan

show all available bssid's


--------------------------------------------------------------------------------------------------------------

If your doing protocol test and you want to have control how many packets are going from your wifi interface. You need to do simple two steps before compiling.

step 1:-

In madwifi-0.9.4/ath/if_athvar.h file change line '#define ATH_TXMAXTRY 11' to '#define ATH_TXMAXTRY 1'.

step 2:-

In madwifi-0.9.4/ath/if_ath.c file change line 'sc -> sc_mrretry = ath_hal_setupxtxdesc(ath, NULL, 0,0,0,0,0,0);' with 'sc-> sc_mrretry=0'.

In place of '0' you can write FALSE and add macro for it.


Now start compilation.

--------------------------------------------------------------------------------------------

installing pcap2air and other airbase tools for wifi

To work with airbase wifi tools we need to have lorcon.
Lorcon is a bookshop specializing in frame insertion for 802.11 (Wi-Fi standard) and is on suite that supports the Airbase. The suite is created in C + + and is formed by applications, Airware-test, fuzz-e pcap2air,pcap-match, pcap-tac, pcap-wepcrypt, prism-strip simple-replay.

Currently, some programs use functions Airbase Lorcon bookstores that are considered obsolete (Deprecated).We'll see how to modify the code to fit the new libraries.

Download and install Lorcon (SVN):

$ svn co https://802.11ninja.net/svn/lorcon/trunk/

(If svn client is not isntalled on your machine just do #yum install subversion)

$ cd trunk/
$ ./configure
$ make
# make install

Download Airbase 2.40: http://www.802.11mercenary.net/downloads/
Files to modify:

airbase-svn-223/80211fp/jc-CTS-printer/src/boring.cpp airbase-svn-223/80211fp/jc-CTS-printer/src/boring.cpp
airbase-svn-223/80211fp/jc-duration-printer/src/duration_pcap_preprocessor.cpp
airbase-svn-223/libs/lib802finger/src/station-lister.cpp: airbase-svn-223/libs/lib802finger/src/station-lister.cpp:
airbase-svn-223/tools/pcap2air/boring.cpp: airbase-svn-223/tools/pcap2air/boring.cpp:
airbase-svn-223/tools/simple-replay/boring.cpp airbase-svn-223/tools/simple-replay/boring.cpp
airbase-svn-223/tools/fuzz-e/boring.cpp airbase-svn-223/tools/fuzz-e/boring.cpp

To modify:

We just replace these lines of code file containing:

tx80211_setmode

By:

tx80211_setfunctionalmode

Installing Airbase-release-2.40:


$ tar -zxvf airbase-release-2.40.tar.gz
$ cd airbase-svn-223
$ cd libs/libairware/
$ make
# make install
$ cd ../../tools/
$ ./build.sh
# ./install.sh

# ln -s /usr/local/bin/airbase/* /usr/sbin/

If any tools complain about missing shared library liborcon-1.0.0.so then

cp /usr/local/lib/liborcon-1.0.0.so /usr/lib

Now bellow airbase tools will be available

Airware-test
fuzz-e
pcap2air
pcap-match
pcap-tac
pcap-wepcrypt
prism-strip
simple-replay

Friday, August 28, 2009

how to install windows after linux installed.

I have seen so many are getting problem of installing windows on machine which has linux already installed.
Here are simple steps to install.
(Please take backup of your imp date before doing this)

On linux shell do fdisk -l

shows you all disks.
Free one of the partition.

Restart the system install windows on that partition. (make sure of partition on which window are installing).

Now if you reboot, only windows will come. (expected result).

Now put linux boot able CD or DVD and restart.
Boot from the CD or DVD.
at prompt boot:...
type "linux rescue" (without quotes) and enter.(according to centOS)
Follow steps will lead to shall prompt.

Now mount your boot partition if you have boot as separate partition, if root partition itself contain boot then mount root partition.

ex: mount /dev/sda2 /mnt/temp
cd /mnt/temp

Now open menu.lst (probably it is in /root/boot/grub/menu.lst or /boot/grub.lst)
Add bellow lines

title Windows xp
root (hda0,0)
makeactive
chainloader +1

Here depending on your partion you need to modify the parameters.
Title :- Name you want to see at boot menu list.
root :- It is the partition where menu.lst present. Genrally grub list all disks as "hda" even you have sata hard disk.One more thing is fdisk -l list partitions staring index from 1 to ...
ex:- sad1, sda2....
but grub will take hda0, hda1, hda2 .....
So, if for example boot partion is sda3 you need to give "root (hda2, 0)".
Other two paramets doesn't change.

Now at shall enter grub command.
ex:
sh#grub
grub> root (hda0)
grub> setup (had0)
grub> quit
sh#reboot

After reboot in boot from list, it will show liux and windows.
Now choose from which you want to boot.
Enjoy.............

Thursday, August 20, 2009

commands i use while programing

VIM :-

:set nonu -remove line numbers
:sp file_name ctrl+ww to switch between split sreen
:$ --------------------- go to end of file
:1 ---------------------- go to starting of file 1 number specifies line number
:%s/OLD/NEW/g -- replace OLD string with NEW string globally (g) and in total file %s
:%s/^/NEW/g -- every line at the starting NEW string will be added.
1,30s/rr/aa/gi ----1,30 represent 1 to 30 lines s stands for string rr serch string pattern and aa is replace string g is globally i is case insenditive


ctags * should be enabled at comd prompt
cntl + ] goto function defination
cntl + o come back
cscope -R --- cntl+d exit ,tab to move to bellow cmds

Enabling and running core dump:
add below two line in bashrc file and restart shell.
ulimit -c unlimited
echo core.%e.%p.%s.%t > /proc/sys/kernel/core_pattern
Now if any core dump file is generated run the below command with proper file name.
gcc executable coredump.xxx

CMDS:-

route --gives route details for ur terminal
route add default gw to add default gateway in route table.
nm obj_file | grep function_name_to_find


diff file1 file2 --- list lines differed
vi -t functionname goes to that function. (ctags * must be used)



svn:

svn co svn+ssh://phaneendra@/repos/project

svn add dir1 dir2 file1 ....
svn ci

svn import dir/file svn+ssh://phaneendra@/repos/project

svn diff

svn ci file_name

if you want to changes code base user name or in some situations your useradmin changed you user name. Now you want to checkin code which your checked out with different name.
now you have two options
1)
Checkout with your new name and see diffs between the code you modified(checked out with old name) and make changed to new check out code base and then check in. (off-course very very bad idea)

2)Best way is use bellow command at trunk (main directory of checkout code)

svn switch svn+ssh://old_user_name@ip/repos/trunk svn+ssh://new_user_name@ip/repos/trunk --relocate
it asks for new user password provide it. Your work is done.


to open .chm files in ubuntu

# apt-get install gnochm

$ gnochm file.chm

Cross compilation:-
source env-setup /usr/local//
This environment variable should be executed before cross compiling or put it in bashrc file and open new terminal and start working.



wed :-

http://cscope.sourceforge.net/large_projects.html
http://www.network-theory.co.uk/ (for gcc and valgring)
http://www.experts-exchange.com/Programming/System/Linux/
http://www.securitytube.net/Programming-Video-List.aspx

Learn perl easy part5

perl 1 2 3 4
What is a Subroutine?

We have been using a form of subroutines all along. Perl functions are basically built in subroutines. You call them (or "invoke") a function by typing its name, and giving it one or more arguments.

Example: Length

my $seq = 'ATGCAAATGCCA';

my $seq_length = length $seq; ## OR
my $seq_length = length($seq);

# $seq_length now contains 12

Perl gives you the opportunity to define your own functions, called "subroutines". In the simplest sense, subroutines are named blocks of code that can be reused as many times as you wish.

Example: A very basic subroutine


sub Hello {
print "Hello World!!\n";
}

print "Sometimes I just want to shout ";
Hello(); #or &Hello;


Example: Some simple subroutines


sub hypotenuse {
my ($a,$b) = @_;
return sqrt($a**2 + $b**2);
}
sub E {
return 2.71828182845905;
}
#########

$y = 3;
$x = hypotenuse($y,4);
# $x now contains 5

$x = hypotenuse((3*$y),12);
# $x now contains 15

$value_e = E();
# $value_e now contains 2.71828182845905

This way of using subroutines makes them look suspiciously like functions. Note: Unlike a function, you must use parentheses when calling a subroutine in this manner, even if you are giving it no arguments.
The Magic Array - @_

Perhaps the most important concept to understand is that values are passed to the subroutine in the default array @_. This array springs magically into existence, and contains the list of values that you gave to subroutine (within the parentheses).

Example: The magic of @_


sub Add_two_numbers {
my ($number1) = shift; # get first argument from @_ and put it in $number1
my ($number2) = shift; # get second argument from @_ and put it in $number2

my $sum = $number1 + $number2;
return $sum;
}

sub Add_two_numbers_2 {
my ($number1,$number2) = @_;
my $sum = $number1 + $number2;
return $sum;
}

sub Add_two_numbers_arcane {
return ($_[0] + $_[1]);
}

Some Subroutine Notes

* Use a name for your subroutine that makes sense to you. Avoid using names that Perl already uses (like "length" or "print"), unless you really like making yourself miserable.
* If you don't give a return statement, the subroutine will return the last value calculated.
* You may have multiple return statements. The first one that is executed will exit the subroutine

Example: A more complex subroutine with different returns


sub Number_Examiner {
my $number = shift;

unless ($number =~ /^\d+$/){
return "You sure this is a number?";
}
if ($number >= 100){
return "Big Number!";
}
elsif ($number > 50){
return "Bigger than 50!;
}
else {
return "Wee Little Number";
}
}

* You can return either a single value or a list of values. You can, if you wish, return nothing. Remember to use your subroutine in a way that reflects the number of values you expect to get back.

Example: Know what you expect


my ($value1,$value2,$value3) = ReturnThreeValues();
# if you are expecting three values back, make space for them.
my (@values) = ReturnThreeValues(); # another way to do it

my ($value1,$value2) = ReturnThreeValues();
# the last value is lost, gone, vanished, DOA... You may have
wanted to do this.

"my" Variables

Variables that you use in a subroutine should be made private to that subroutine with the my operator. This avoids accidentally overwriting similarly-named variables in the main program. If you already included use strict at the top of your program, perl will check that all variables are introduced with my.

Why Use My?



my $var = "Boo!";
Scary();
print "$var\n";

sub Scary{
print "$var\n";
$var = "Eeek!";
}

# The results:
Boo!
Eeek!


Variables made private with my only exist within a block (curly braces). The subroutine body is a block, so the my variables only exist within the body of the subroutine.

You can make scalars, arrays and hashes private. If you apply my() to a list, it makes each member of the list private.

{ # start a block
my $scalar; # $scalar is private
my @array; # now @array is private
my %hash; # %hash is private

# same thing, but in one swell foop
my ($scalar,@array,%hash);
}


lectuer 7:

1. Using a Module
2. Getting Module Documentation
3. Installing Modules
4. More About Importing
5. Where are Modules Installed?
6. The Anatomy of a Module
7. Exporting Variables & Functions from Modules
8. Using Object-Oriented Modules


Using a Module

A module is a package of useful subroutines and variables that someone has put together. Modules extend the ability of Perl.
Example 1: The File::Basename Module

The File::Basename module is a standard module that is distributed with Perl. When you load the File::Basename module, you get two new functions, basename and dirname.

basename takes a long UNIX path name and returns the file name at the end. dirname takes a long UNIX path name and returns the directory part.


#!/usr/bin/perl
# file: basename.pl

use strict;
use File::Basename;

my $path = '/bush_home/bush1/lstein/C1829.fa';
my $base = basename($path);
my $dir = dirname($path);

print "The base is $base and the directory is $dir.\n";

The output of this program is:

The base is C1829.fa and the directory is /bush_home/bush1/lstein.

The use function loads up the module named File::Basename and imports the two functions. If you didn't use use, then the program would print an error:

Undefined subroutine &main::basename called at basename.pl line 8.

Example 2: The Env Module

The Env module is a standard module that provides access to the environment variables. When you load it, it imports a set of scalar variables corresponding to your environment.

#!/usr/bin/perl
# file env.pl

use strict;
use Env;

print "My home is $HOME\n";
print "My path is $PATH\n";
print "My username is $USER\n";

When this runs, the output is:

My home is /bush_home/bush1/lstein
My path is /net/bin:/usr/bin:/bin:/usr/local/bin:/usr/X11R6/bin:/bush_home/bush1/lstein/bin:.
My username is lstein

Controlling What Gets Imported

Each module will automatically import a different set of variables and subroutines when you use it. You can control what gets imported by providing use with a list of what to import.

By default the Env module will import all the environment variables. You can make it import only some:

#!/usr/bin/perl
# file env2.pl

use strict;
use Env '$HOME','$PATH';

print "My home is $HOME\n";
print "My path is $PATH\n";
print "My username is $USER\n";

Global symbol "$USER" requires explicit package name at env2.pl line 9.
Execution of env2.pl aborted due to compilation errors.

You can import scalars, hashes, arrays and functions by giving a list of strings containing the variable or function names. This line imports a scalar named $PATH, an array named @PATH, and a function named printenv.

#!/usr/bin/perl

use Env '$PATH','@PATH','printenv';

print join "\n",@PATH;

Output:

/net/bin
/usr/bin
/bin
/usr/local/bin
/usr/X11R6/bin
/bush_home/bush1/lstein/bin
.

You will often see the qw() operator used to reduce typing:

use TestModule qw($PATH $HOME @PATH printenv);


Finding out What Modules are Installed

Here are some tricks for finding out what Modules are installed.
Preinstalled Modules

To find out what modules come with perl, look in Appendix A of Perl 5 Pocket Reference. From the command line, use the perldoc command from the UNIX shell. All the Perl documentation is available with this command:

% perldoc perlmodlib
PERLMODLIB(1) User Contributed Perl Documentation PERLMODLIB(1)

NAME
perlmodlib - constructing new Perl modules and finding
existing ones

DESCRIPTION
THE PERL MODULE LIBRARY
Many modules are included the Perl distribution. These
are described below, and all end in .pm. You may discover
...
Standard Modules

Standard, bundled modules are all expected to behave in a
well-defined manner with respect to namespace pollution
because they use the Exporter module. See their own docu-
mentation for details.

AnyDBM_File Provide framework for multiple DBMs

AutoLoader Load subroutines only on demand

AutoSplit Split a package for autoloading

B The Perl Compiler
...

To learn more about a module, run perldoc with the module's name:

% perldoc File::Basename

NAME
fileparse - split a pathname into pieces

basename - extract just the filename from a path

dirname - extract just the directory from a path

SYNOPSIS
use File::Basename;

($name,$path,$suffix) = fileparse($fullname,@suffixlist)
fileparse_set_fstype($os_string);
$basename = basename($fullname,@suffixlist);
$dirname = dirname($fullname);
...

Optional Modules that You May Have Installed

perldoc perllocal will list the names of locally installed modules.

% perldoc perllocal
Thu Apr 27 16:01:31 2000: "Module" the DBI manpage

o "installed into: /usr/lib/perl5/site_perl"

o "LINKTYPE: dynamic"

o "VERSION: 1.13"

o "EXE_FILES: dbish dbiproxy"

Thu Apr 27 16:01:41 2000: "Module" the Data::ShowTable
manpage

o "installed into: /usr/lib/perl5/site_perl"

o "LINKTYPE: dynamic"

o "VERSION: 3.3"

o "EXE_FILES: showtable"

Tue May 16 18:26:27 2000: "Module" the Image::Magick man-
page
...


Installing Modules

You can find thousands of Perl Modules on CPAN, the Comprehensive Perl Archive Network:

http://www.cpan.org

Installing Modules Manually

Search for the module on CPAN using the keyword search. When you find it, download the .tar.gz module. Then install it like this:

% tar zxvf bioperl-0.7.1.tar.gz
bioperl-0.7.1/
bioperl-0.7.1/Bio/
bioperl-0.7.1/Bio/DB/
bioperl-0.7.1/Bio/DB/Ace.pm
bioperl-0.7.1/Bio/DB/GDB.pm
bioperl-0.7.1/Bio/DB/GenBank.pm
bioperl-0.7.1/Bio/DB/GenPept.pm
bioperl-0.7.1/Bio/DB/NCBIHelper.pm
bioperl-0.7.1/Bio/DB/RandomAccessI.pm
bioperl-0.7.1/Bio/DB/SeqI.pm
bioperl-0.7.1/Bio/DB/SwissProt.pm
bioperl-0.7.1/Bio/DB/UpdateableSeqI.pm
bioperl-0.7.1/Bio/DB/WebDBSeqI.pm
bioperl-0.7.1/Bio/AlignIO.pm

% perl Makefile.PL
Generated sub tests. go make show_tests to see available subtests
...
Writing Makefile for Bio

% make
cp Bio/Tools/Genscan.pm blib/lib/Bio/Tools/Genscan.pm
cp Bio/Root/Err.pm blib/lib/Bio/Root/Err.pm
cp Bio/Annotation/Reference.pm blib/lib/Bio/Annotation/Reference.pm
cp bioback.pod blib/lib/bioback.pod
cp Bio/AlignIO/fasta.pm blib/lib/Bio/AlignIO/fasta.pm
cp Bio/Location/NarrowestCoordPolicy.pm blib/lib/Bio/Location/NarrowestCoordPolicy.pm
cp Bio/AlignIO/clustalw.pm blib/lib/Bio/AlignIO/clustalw.pm
cp Bio/Tools/Blast/Run/postclient.pl blib/lib/Bio/Tools/Blast/Run/postclient.pl
cp Bio/LiveSeq/Intron.pm blib/lib/Bio/LiveSeq/Intron.pm
...
Manifying blib/man3/Bio::LiveSeq::Exon.3
Manifying blib/man3/Bio::Location::CoordinatePolicyI.3
Manifying blib/man3/Bio::SeqFeature::Similarity.3

% make test
PERL_DL_NONLAZY=1 /net/bin/perl -Iblib/arch -Iblib/lib -I/net/lib/perl5/5.6.1/i686-linux -I/net/lib/perl5/5.6.1 -e 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests @ARGV;' t/*.t
t/AAChange..........ok
t/AAReverseMutate...ok
t/AlignIO...........ok
t/Allele............ok
...
t/WWW...............ok
All tests successful, 95 subtests skipped.
Files=60, Tests=1011, 35 wallclock secs (25.47 cusr + 1.60 csys = 27.07 CPU)

% make install
Installing /net/lib/perl5/site_perl/5.6.1/bioback.pod
Installing /net/lib/perl5/site_perl/5.6.1/biostart.pod
Installing /net/lib/perl5/site_perl/5.6.1/biodesign.pod
Installing /net/lib/perl5/site_perl/5.6.1/bptutorial.pl
...

If you have an older version of the tar program, you may need to replace the first step with this:

% gunzip -c bioperl-0.7.1.tar.gz | tar xvf -

Installing Modules Using the CPAN Shell

Perl has a CPAN module installer built into it. You run it like this:

% perl -MCPAN -e shell

cpan shell -- CPAN exploration and modules installation (v1.59_54)
ReadLine support enabled

cpan>

From this shell, there are commands for searching for modules, downloading them, and installing them.

[The first time you run the CPAN shell, it will ask you a lot of configuration questions. Generally, you can just hit return to accept the defaults. The only trick comes when it asks you to select CPAN mirrors to download from. Choose any ones that are in your general area on the Internet and it will work fine.]

Here is an example of searching for the Text::Wrap program and installing it:

cpan> i /Wrap/
Going to read /bush_home/bush1/lstein/.cpan/sources/authors/01mailrc.txt.gz
CPAN: Compress::Zlib loaded ok
Going to read /bush_home/bush1/lstein/.cpan/sources/modules/02packages.details.txt.gz
Database was generated on Tue, 16 Oct 2001 22:32:59 GMT
CPAN: HTTP::Date loaded ok
Going to read /bush_home/bush1/lstein/.cpan/sources/modules/03modlist.data.gz
Distribution B/BI/BINKLEY/CGI-PrintWrapper-0.8.tar.gz
Distribution C/CH/CHARDIN/MailQuoteWrap0.01.tgz
Distribution C/CJ/CJM/Text-Wrapper-1.000.tar.gz
...
Module Text::NWrap (G/GA/GABOR/Text-Format0.52+NWrap0.11.tar.gz)
Module Text::Quickwrap (Contact Author Ivan Panchenko )
Module Text::Wrap (M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz)
Module Text::Wrap::Hyphenate (Contact Author Mark-Jason Dominus )
Module Text::WrapProp (J/JB/JBRIGGS/Text-WrapProp-0.03.tar.gz)
Module Text::Wrapper (C/CJ/CJM/Text-Wrapper-1.000.tar.gz)
Module XML::XSLT::Wrapper (M/MU/MULL/XML-XSLT-Wrapper-0.32.tar.gz)
41 items found

cpan> install Text::Wrap
Running install for module Text::Wrap
Running make for M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
ftp://archive.progeny.com/CPAN/authors/id/M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz
CPAN: MD5 loaded ok
Fetching with LWP:
ftp://archive.progeny.com/CPAN/authors/id/M/MU/MUIR/modules/CHECKSUMS
Checksum for /bush_home/bush1/lstein/.cpan/sources/authors/id/M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz ok
Scanning cache /bush_home/bush1/lstein/.cpan/build for sizes
Text-Tabs+Wrap-2001.0929/
Text-Tabs+Wrap-2001.0929/MANIFEST
Text-Tabs+Wrap-2001.0929/CHANGELOG
Text-Tabs+Wrap-2001.0929/Makefile.PL
Text-Tabs+Wrap-2001.0929/t/
Text-Tabs+Wrap-2001.0929/t/fill.t
Text-Tabs+Wrap-2001.0929/t/tabs.t
Text-Tabs+Wrap-2001.0929/t/wrap.t
Text-Tabs+Wrap-2001.0929/README
Text-Tabs+Wrap-2001.0929/lib/
Text-Tabs+Wrap-2001.0929/lib/Text/
Text-Tabs+Wrap-2001.0929/lib/Text/Wrap.pm
Text-Tabs+Wrap-2001.0929/lib/Text/Tabs.pm

CPAN.pm: Going to build M/MU/MUIR/modules/Text-Tabs+Wrap-2001.0929.tar.gz

Checking if your kit is complete...
Looks good
Writing Makefile for Text
cp lib/Text/Wrap.pm blib/lib/Text/Wrap.pm
cp lib/Text/Tabs.pm blib/lib/Text/Tabs.pm
Manifying blib/man3/Text::Wrap.3
Manifying blib/man3/Text::Tabs.3
/usr/bin/make -- OK
Running make test
PERL_DL_NONLAZY=1 /net/bin/perl -Iblib/arch -Iblib/lib -I/net/lib/perl5/5.6.1/i686-linux -I/net/lib/perl5/5.6.1 -e 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests @ARGV;' t/*.t
t/fill..............ok
t/tabs..............ok
t/wrap..............ok
All tests successful.
Files=3, Tests=37, 0 wallclock secs ( 0.20 cusr + 0.00 csys = 0.20 CPU)
/usr/bin/make test -- OK
Running make install
Installing /net/lib/perl5/5.6.1/Text/Wrap.pm
Installing /net/man/man3/Text::Wrap.3
Installing /net/man/man3/Text::Tabs.3
Writing /net/lib/perl5/5.6.1/i686-linux/auto/Text/.packlist
Appending installation info to /net/lib/perl5/5.6.1/i686-linux/perllocal.pod
/usr/bin/make install UNINST=1 -- OK

cpan> quit
Lockfile removed.


More About Importing

Recall that each module has a default list of functions and variables to import. Some modules import many functions by default, others import none. Most import some.

Modules that have a lot of functions and variables to import frequently put them into groups. Groups can be specified using the ":group" syntax.

For example, the CGI::Pretty module has a group called ":standard", which imports a bunch of standard functions for creating HTML pages.


#!/usr/bin/perl
# file: html.pl

use strict;
use CGI::Pretty qw(:standard);

print h1('This is a level one header');
print p('This is a paragraph.');
print p('Here is some',i('italicized'),'text.');

% html.pl


This is a level one header



This is a paragraph.



Here is some italicized text.




The module's documentation will tell you what function groups are defined. To import the default functions, plus optional ones, use the group ":DEFAULT".

use CGI::Pretty qw(:DEFAULT :standard start_html);



Where are Modules Installed?

Module files end with the extension .pm. If the module name is a simple one, like Env, then Perl will look for a file named Env.pm. If the module name is separated by :: sections, Perl will treat the :: characters like directories. So it will look for the module File::Basename in the file File/Basename.pm

Perl searches for module files in a set of directories specified by the Perl library path. This is set when Perl is first installed. You can find out what directories Perl will search for modules in by issuing perl -V from the command line:

% perl -V
Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration:
Platform:
osname=linux, osvers=2.4.2-2smp, archname=i686-linux
...
Compiled at Oct 11 2001 11:08:37
@INC:
/usr/lib/perl5/5.6.1/i686-linux
/usr/lib/perl5/5.6.1
/usr/lib/perl5/site_perl/5.6.1/i686-linux
/usr/lib/perl5/site_perl/5.6.1
/usr/lib/perl5/site_perl
.

You can modify this path to search in other locations by placing the use lib command somewhere at the top of your script:

#!/usr/bin/perl

use lib '/home/lstein/lib';
use MyModule;
...

This tells Perl to look in /home/lstein/lib for the module MyModule before it looks in the usual places. Now you can install module files in this directory and Perl will find them.

The Anatomy of a Module File

Here is a very simple module file named "MySequence.pm":

package MySequence;
#file: MySequence.pm

use strict;
our $EcoRI = 'ggatcc';

sub reversec {
my $sequence = shift;
$sequence = reverse $sequence;
$sequence =~ tr/gatcGATC/ctagCTAG/;
return $sequence;
}

sub seqlen {
my $sequence = shift;
$sequence =~ s/[^gatcnGATCN]//g;
return length $sequence;
}

1;

A module begins with the keyword package and ends with "1;". package gives the module a name, and the 1; is a true value that tells Perl that the module compiled completely without crashing.

The our keyword declares a variable to be global to the module. It is similar to my, but the variable can be shared with other programs and modules ("my" variables cannot be shared outside the current file, subroutine or block). This will let us use the variable in other programs that depend on this module.

To install this module, just put it in the Perl module path somewhere, or in the current directory.
Using the MySequence.pm Module

Using this module is very simple:

#!/usr/bin/perl
#file: sequence.pl

use strict;
use MySequence;

my $sequence = 'gattccggatttccaaagggttcccaatttggg';
my $complement = MySequence::reversec($sequence);

print "original = $sequence\n";
print "complement = $complement\n";

% sequence.pl
original = gattccggatttccaaagggttcccaatttggg
complement = cccaaattgggaaccctttggaaatccggaatc

Unless you explicitly export variables or functions, the calling function must explicitly qualify each MySequence function by using the notation:

MySequence::function_name

For a non-exported variable, the notation looks like this:

$MySequence::EcoRI


Exporting Variables and Functions from Modules

To make your module export variables and/or functions like a "real" module, use the Exporter module.

package MySequence;
#file: MySequence.pm

use strict;
use base 'Exporter';

our @EXPORT = qw(reversec seqlen);
our @EXPORT_OK = qw($EcoRI);

our $EcoRI = 'ggatcc';

sub reversec {
my $sequence = shift;
$sequence = reverse $sequence;
$sequence =~ tr/gatcGATC/ctagCTAG/;
return $sequence;
}

sub seqlen {
my $sequence = shift;
$sequence =~ s/[^gatcnGATCN]//g;
return length $sequence;
}

1;

The use base 'Exporter' line tells Perl that this module is a type of "Exporter" module. As we will see later, this is a way for modules to inherit properties from other modules. The Exporter module (standard in Perl) knows how to export variables and functions.

The our @EXPORT = qw(reversec seqlen) line tells Perl to export the functions reversec and seqlen automatically. The our @EXPORT_OK = qw($EcoRI) tells Perl that it is OK for the user to import the $EcoRI variable, but not to export it automatically.

The qw() notation is telling Perl to create a list separated by spaces. These lines are equivalent to the slightly uglier:

our @EXPORT = ('reversec','seqlen');

Using the Better MySequence.pm Module

Now the module exports its reversec and seqlen functions automatically:

#!/usr/bin/perl
#file: sequence2.pl

use strict;
use MySequence;

my $sequence = 'gattccggatttccaaagggttcccaatttggg';
my $complement = reversec($sequence);

print "original = $sequence\n";
print "complement = $complement\n";

The calling program can also get at the value of the $EcoRI variable, but he has to ask for it explicitly:

#!/usr/bin/perl
#file: sequence3.pl

use strict;
use MySequence qw(:DEFAULT $EcoRI);

my $sequence = 'gattccggatttccaaagggttcccaatttggg';
my $complement = reversec($sequence);

print "original = $sequence\n";
print "complement = $complement\n";

if ($complement =~ /$EcoRI/) {
print "Contains an EcoRI site.\n";
} else {
print "Doesn't contain an EcoRI site.\n";
}

Note that we must now import the :DEFAULT group in order to get the default reversec and seqlen functions.


Object-Oriented Modules

Some modules are object-oriented. Instead of importing a series of subroutines that are called directly, these modules define a series of object types that you can create and use. We will talk about object-oriented syntax in greater detail in the Perl References and Objects lecture. Here we will just show an example:
The Math::Complex Module

The Math::Complex module is a standard module that implements complex numbers. You work with it by creating one or more Math::Complex objects. You can then manipulate these objects mathematically by adding them, subtracting them, multiplying, and so on. Here is a brief example:

#!/usr/bin/perl
# file: complex.pl

use strict;
use Math::Complex;

my $a = Math::Complex->make(5,6);
my $b = Math::Complex->make(10,20);
my $c = $a * $b;

print "$a * $b = $c\n";

We load the Math::Complex module with use, but now instead of calling imported subroutines, we create two objects named $a and $b. Both are created by calling Math::Complex->make() with two arguments. The first argument is the real part of the complex number, and the second is the imaginary part. The return value from make() is the complex number object. We multiply the two numbers together and store the result in $c. Finally, we print out all three values. The script's output is:

51% perl complex.pl
5+6i * 10+20i = -70+160i

Object Syntax

The call to make() uses Perl's object-oriented syntax. Read it as meaning "invoke the make() subroutine that is located inside the Math::Complex package." The call is similar, but not quite equivalent, to this:

Math::Complex::make(10,20)

The difference is that the object-oriented syntax tells Perl to pass the name of the module as an implicit first argument to make(). Therefore, Math::Complex->make(10,20) is almost exactly equivalent to this:

Math::Complex::make('Math::Complex',10,20)

If you are using object-oriented modules, you will never have to worry about this extra argument. If you are writing object-oriented modules, the necessity for the extra argument will make sense to you.

Learn perl easy part4

Filehandles

You can create your own filehandles using the open function, read and/or write to them, and then clean up using close.
open

open opens a file for reading and/or writing, and associates a filehandle with it. You can choose any name for the filehandle, but the convention is to make it all caps. In the examples, we use FILEHANDLE.
open a file for reading open FILEHANDLE,"cosmids.fasta" alternative form: open FILEHANDLE,"
open a file for writing open FILEHANDLE,">cosmids.fasta"

open a file for appending open FILEHANDLE,">>cosmids.fasta'

open a file for reading and writing open FILEHANDLE,"+Catching Open Failures

It's common for open to fail. Maybe the file doesn't exist, or you don't have permissions to read or create it. Always check open's return value, which is TRUE if the operation succeeded, FALSE otherwise:

$result = open COSMIDS,"cosmids.fasta";
die "Can't open cosmids file: $!\n" unless $result;

When an error occurs, the $! variable holds a descriptive string containing a description of the error, such as "file not found".

There is a compact idiom for accomplishing this in one step:

open COSMIDS,"cosmids.fasta" or die "Can't open cosmids file: $!\n";

Using a Filehandle

Once you've created a filehandle, you can read from it or write to it, just as if it were STDIN or STDOUT. This code reads from file "text.in" and copies lines to "text.out":

open IN,"text.in" or die "Can't open input file: $!\n";
open OUT,">text.out" or die "Can't open output file: $!\n";

while ($line = ) {
print OUT $line;
}

Closing a Filehandle

When you are done with a filehandle, you should close it. This will also happen automatically when your program ends, or if you reuse the same filehandle name.

close IN or warn "Errors while closing filehandle: $!";

Some errors, like filesystem full, only occur when you close the filehandle, so you should check for errors in the same way you do when you open a filehandle.

The Magic of <>

The bare <> function when used without any explicit filehandle is magical. It reads from each of the files on the command line as if they were one single large file. If no file is given on the command line, then <> reads from standard input.

This sounds weird, but it is extremely useful.
A Practical Example of <>

Count the number of lines and bytes in a series of files. If no file is specified, count from standard input (like wc does).

Code:

#!/usr/local/bin/perl
# file: wc.pl
($bytes,$lines) = (0,0);

while (<>) {
$bytes += length($_);
$lines++;
}

print "LINES: $lines\n";
print "BYTES: $bytes\n";

Output:

(~/grant) 79% wc.pl progress.txt
LINES: 102
BYTES: 5688

(~/grant) 80% wc.pl progress.txt resources.txt specific_aims
LINES: 481
BYTES: 24733


Globals and Functions that Affect I/O

Several built-in globals affect input and output:
$/ The input record separator. The value of this global is used by to determine where the end of a line is. Normally "\n".

$\ The record output string. Whatever this is set to will appear at the end of everything printed by print. Normally empty.

$, The output field separator. Appears between all items printed with the print function. Normally empty. $" The output list separator. Interpolated between all items of an array when an array is interpolated into a double-quoted string. Normally a space. $. The line count. When reading from <>, this will be set to the line number of the "virtual file".

Example use of Input Record Separator

Say you have a text file containing records in the following interesting format:


>gi|5340860|gb|AI793144.1|AI793144 on36f02.y5 NCI_CGAP_Lu5 Homo sapiens cDNA clone
CAAACAGCCCCCGATAACGCTACGTGAGCTGGGCCCTGGGCCTGAGGCAGAAAACGGACGGAAGAAAAGG
TCTGGCCGGAGATGGGTCTCACTCTGTCACCCAGACTGGAGTGCAGTGAGTGGTGCGATCATAGCTTACT
GCAGCCTGAAACTCCTGGGCTCAAGTGATCTTCTCGCCTCAGCCTCCTGAGTAGCTGGAGCTACAGGAAT
GAGCATAGATGAACAATGTTGCATCACGCTTGACATCACCGGNGCTTCTTTCCAGTGTGGATTTGCTCAT
GTAAAATGAGGTGTGAGCTCTGCCTGAAAGCTTTTCCATATGCATCACATTTGCAGGGCTTTTCTCCAGT
GTGGGTTCTTTGGTGTCTCAAAAGATGTGAGCTGTTACTGAAAGCTTTCCCACACACATCACACTCATAG
GGCTTCTCTCTACCGTGGATTCGCTGGTGTCCAACAAGAGCTGAACTGTATCTGAAGGCCTTTCCACGCT
TGTCACATTCATATAGTTTCTTTCCACTGTGGATTNTCTGGTGACAGAAGAGGCCCAAGCACTAGCTAAA
GCTNTTCCCTCACTCACTACACTGCTATGGCTTCTCTTCAGTATGAACTCTGATGTTGTCTCAGATATGA
ACTCAGAGAGGATNTCCCACAATCATTACACTGGTATGGTTCCTTTTCGTGTGAGTTCTCTGGTGTCNAA
ATACATCTGAGCTGTGATGAAAGAACTTNCCACACTCACTACATTGGGAAGG

>gi|4306680|gb|AI451833.1|AI451833 mx13e08.y1 Soares mouse NML Mus musculus cDNA clone
TGAATGTATGCAGTGCGGAAAGACATTCACTTCTGGCCACTGTGCCAGAAGACATTTAGGGACTCACAGT
GGAGCCTGGCCTTACAAATGTGAAGTGTGTGGGAAAGCTTATCCCTACGTCTATTCCCTTCGAAACCACA
AAAAAAGTCACAACGAAGAAAAACTTTATGAATGTAAACAATGTGGGAAAGCCTTTAAATACATTTCTTC
CTTACGCAACCACGAGACTACTCACACTGGAGAGAAGCCCTATGAATGTAAGGAATGTGGGAAAGCCTTT
AGTTGTTCCAGTTACATTCAAAATCACATGAGAACACACAAAAGGCAGTCCTATGAATGTAAGGAGTGTG
GTAAGGTGTTCTCATATTCCAAAAGTCTTCGGAGACACATGACTACACATAGTTAATTAGAGAGGGATAG
TTNTAAGTATAATTTAAATATATAAAAGAGCTCTACACATTCTAGCTCCTCATTAAGAAACAAAAAATTT
CACACTGGAAAACGAGCCTATGAATGCAGTATGTGTGCCAAAGTCTCAGTACATGCCACAGT

>gi|3400733|gb|AI074089.1|AI074089 oq97c08.x1 NCI_CGAP_Co12 Homo sapiens cDNA clone
GAATCTTCTGGGTCCTCTTTATTAAGAGCCCTCTGCCTTCCCAGGGGAGGGAAGCAAATCCTTCAGGGCC
CCCAGAGTTCCTGCACCCCATATCATGGGTGAGTCCTACCAGCCACAGAGCCACCCGTCACCGTGGAGAG
GCTTAAGCTGCACTCAGAGCTCCCCCCGGGCATGCCGAATGTAGTGTTGATGCAGCCCTGCTTCCTGAGC
AAAGTCCTGACCGCACTCTGTGCAGGCGAAGGTGCCAGGAGGGGCACGGACCTCATGCATCTGGCGGTGC
CGCCTCAGAGAAACAGCCTGCCCAAAGGTCTTGCCACAGTCAGGACAAGGGAAGGTGGGCTGGGCAGTAG
TGGTTGCAACCGGCAGGGTGGGCTTGGCGGCTGGACCGTGGCTGCGCTGGTGGGTGATTAGGGCTTTGGA
...

If you use standard <>, you will get a line at a time, and have to figure out where one record ends and a new one starts. However, if you set the input record separator to ">", then each time you read a "line", you will read all the way to the next ">" symbol. Throw away the first record (which is empty), keep the others.

#!/usr/local/bin/perl
# file: get_fasta_records.pl

$/ = '>';

<>; # throw away the first record (will be empty)

while (<>) {
chomp;
# split up lines of the record. The first line
# is the sequence ID. The second and subsequent lines
# are the sequence
my ($id,@sequence) = split "\n";
my $sequence = join '',@sequence; # reassemble the sequence
}

Special Uses of the Input Record Separator

The input record separator has two special cases.
Paragraph Mode

If the input record separator ($/) is set to the empty string ("") it goes into paragraph mode. Each <> will read up to the next blank line. Multiple blank lines will be skipped over. This is good for reading text separated into paragraphs.
Slurp Mode

If the input record separator is set to the undefined value (undef) then it goes into slurp mode. The <> operator will read its entire input into a single scalar.

Here's how to read the entire file cosmids.fasta into a scalar variable:

open IN,"cosmids.fasta" or die "Can't open cosmids.fasta: $!\n";
$/ = undef;

$data = ; # data slurp


Regular Expressions

A regular expression is a string template against which you can match a piece of text. They are something like shell wildcard expressions, but much more powerful.
Examples of Regular Expressions

This bit of code loops through each line of a file. Finds all lines containing an EcoRI site, and bumps up a counter:

Code:

#!/usr/bin/perl -w
#file: EcoRI1.pl

use strict;

my $filename = "example.fasta";
open (FASTA , "$filename") or print "$filename does not exist\n";
my $sites;

while (my $line = ) {
chomp $line;

if ($line =~ /GAATTC/){
print "Found an EcoRI site!\n";
$sites++;
}
}

if ($sites){
print "$sites EcoRI sites total\n";
}else{
print "No EcoRI sites were found\n";
}

#note: if $sites is declared inside while loop you would not be able to
#print it outside the loop

Output:

~]$ ./EcoRI1.pl
Found an EcoRI site!
Found an EcoRI site!
.
.
.
Found an EcoRI site!
Found an EcoRI site!
34 EcoRI sites total


This Works Too!
Code:

#file:EcoRI2.pl

while ( ) {
chomp;
if ($_ = /GAATTC/){
print "Found an EcoRI site!\n";
$sites++;
}
}

Output:

~]$ ./EcoRI1.pl
Found an EcoRI site!
Found an EcoRI site!
.
.
.
Found an EcoRI site!
Found an EcoRI site!
34 EcoRI sites total


This Also Works
Code:

#file:EcoRI.pl

while ( ) {
chomp;
if (/GAATTC/){
print "Found an EcoRI site!\n";
$sites++;
}
}

By default, a regular expression examines $_ and returns a TRUE if it matches, FALSE otherwise.
Output:

~]$ ./EcoRI1.pl
Found an EcoRI site!
Found an EcoRI site!
.
.
.
Found an EcoRI site!
Found an EcoRI site!
34 EcoRI sites total

This does the same thing, but counts one type of methylation site (Pu-C-X-G) instead:
Code:

#file:methy.pl

while () {
chomp;

if (/[GA]C.?G/){ #What Happens If Your File Is Not All In CAPS
#print "Found a Methylation Site!\n";
$sites++;
}
}
if ($sites){
print "$sites Methylation Sites total\n";
}else{
print "No Methylation Sites were found\n";
}



Output:

~]$ ./methy.pl
723 Methylation Sites total

Regular Expression Variable

A regular expression is normally delimited by two slashes ("/"). Everything between the slashes is a pattern to match. Patterns can be made up of the following Atoms:

1. Ordinary characters: a-z, A-Z, 0-9 and some punctuation. These match themselves.

2. The "." character, which matches everything except the newline.

3. A bracket list of characters, such as [AaGgCcTtNn], [A-F0-9], or [^A-Z] (the last means anything BUT A-Z).

4. Certain predefined character sets: \d The digits [0-9] \w A word character [A-Za-z_0-9] \s White space [ \t\n\r] \D A non-digit \W A non-word \S Non-whitespace
5. Anchors: ^ Matches the beginning of the string $ Matches the end of the string \b Matches a word boundary (between a \w and a \W)

Examples:

* /g..t/ matches "gaat", "goat", and "gotta get a goat" (twice)

* /g[gatc][gatc]t/ matches "gaat", "gttt", "gatt", and "gotta get an agatt" (once)

* /\d\d\d-\d\d\d\d/ matches 376-8380, and 5128-8181, but not 055-98-2818.

* /^\d\d\d-\d\d\d\d/ matches 376-8380 and 376-83801, but not 5128-8181.

* /^\d\d\d-\d\d\d\d$/ only matches telephone numbers.

* /\bcat/ matches "cat", "catsup" and "more catsup please" but not "scat".

* /\bcat\b/ only text containing the word "cat".

Quantifiers

By default, an atom matches once. This can be modified by following the atom with a quantifier:
? atom matches zero or exactly once* atom matches zero or more times + atom matches one or more times {3} atom matches exactly three times {2,4} atom matches between two and four times, inclusive {4,} atom matches at least four times

Examples:

* /goa?t/ matches "goat" and "got". Also any text that contains these words.
* /g.+t/ matches "goat", "goot", and "grant", among others.
* /g.*t/ matches "gt", "goat", "goot", and "grant", among others.
* /^\d{3}-\d{4}$/ matches US telephone numbers (no extra text allowed.

Alternatives and Grouping

A set of alternative patterns can be specified with the | symbol:

/wolf|sheep/; # matches "wolf" or "sheep"

/big bad (wolf|sheep)/; # matches "big bad wolf" or "big bad sheep"

You can combine parenthesis and quantifiers to quantify entire subpatterns:

/Who's afraid of the big (bad )?wolf\?/;
# matches "Who's afraid of the big bad wolf?" and
# "Who's afraid of the big wolf?"

This also shows how to literally match the special characters -- put a backslash (\) in front of them.
Specifying the String to Match

Regular expressions will attempt to match $_ by default. To specify another string variable, use the =~ (binding) operator:

$h = "Who's afraid of Virginia Woolf?";
print "I'm afraid!\n" if $h =~ /Woo?lf/;

There's also an equivalent "not match" operator !~, which reverses the sense of the match:

$h = "Who's afraid of Virginia Woolf?";
print "I'm not afraid!\n" if $h !~ /Woo?lf/;

Using a Different Delimiter

If you want to match slashes in the pattern, you can backslash them:

$file = '/usr/local/blast/cosmids.fasta';
print "local file" if $file =~ /^\/usr\/local/;

This is ugly, so you can specify any match delimiter with the m (match) operator:

$file = '/usr/local/blast/cosmids.fasta';
print "local file" if $file =~ m!^/usr/local!;

The punctuation character that follows the m becomes the delimiter. In fact // is just an abbreviation for m//. Almost any punctuation character will work:

* m!^/usr/local!
* m#^/usr/local#
* m@^/usr/local@
* m,^/usr/local,
* m{^/usr/local}
* m[^/usr/local]

The last two examples show that you can use left-right bracket pairs as well.
Matching with a Variable Pattern

You can use a scalar variable for all or part of a regular expression. For example:

$pattern = '/usr/local';
print "matches" if $file =~ /^$pattern/;

See the o flag for important information about using variables inside patterns.

Subpatterns

You can extract and manipulate subpatterns in regular expressions.

To designate a subpattern, surround its part of the pattern with parenthesis (same as with the grouping operator). This example has just one subpattern, (.+) :

/Who's afraid of the big bad w(.+)f/

Matching Subpatterns

Once a subpattern matches, you can refer to it later within the same regular expression. The first subpattern becomes \1, the second \2, the third \3, and so on.

while (<>) {
chomp;
print "I'm scared!\n" if /Who's afraid of the big bad w(.)\1f/
}

This loop will print "I'm scared!" for the following matching lines:

* Who's afraid of the big bad woof
* Who's afraid of the big bad weef
* Who's afraid of the big bad waaf

but not

* Who's afraid of the big bad wolf
* Who's afraid of the big bad wife

In a similar vein, /\b(\w+)s love \1 food\b/ will match "dogs love dog food", but not "dogs love monkey food".
Using Subpatterns Outside the Regular Expression Match

Outside the regular expression match statement, the matched subpatterns (if any) can be found the variables $1, $2, $3, and so forth.

Example. Extract 50 base pairs upstream and 25 base pairs downstream of the TATTAT consensus transcription start site:


while (<>) {
chomp;
next unless /(.{50})TATTAT(.{25})/;
my $upstream = $1;
my $downstream = $2;
}

Extracting Subpatterns Using Arrays

If you assign a regular expression match to an array, it will return a list of all the subpatterns that matched. Alternative implementation of previous example:


while (<>) {
chomp;
my ($upstream,$downstream) = /(.{50})TATTAT(.{25})/;
}

If the regular expression doesn't match at all, then it returns an empty list. Since an empty list is FALSE, you can use it in a logical test:


while (<>) {
chomp;
next unless my($upstream,$downstream) = /(.{50})TATTAT(.{25})/;
print "upstream = $upstream\n";
print "downstream = $downstream\n";
}


Grouping without Making Subpatterns

Because parentheses are used both for grouping (a|ab|c) and for matching subpatterns, you may match subpatterns that don't want to. To avoid this, group with (?:pattern):

/big bad (?:wolf|sheep)/;

# matches "big bad wolf" or "big bad sheep",
# but doesn't extract a subpattern.

Subpatterns and Greediness

By default, regular expressions are "greedy". They try to match as much as they can. For example:

$h = 'The fox ate my box of doughnuts';
$h =~ /(f.+x)/;
$subpattern = $1;

Because of the greediness of the match, $subpattern will contain "fox ate my box" rather than just "fox".

To match the minimum number of times, put a ? after the qualifier, like this:

$h = 'The fox ate my box of doughnuts';
$h =~ /(f.+?x)/;
$subpattern = $1;

Now $subpattern will contain "fox". This is called lazy matching.

Lazy matching works with any quantifier, such as +?, *? and {2,50}?.


String Substitution

String substitution allows you to replace a pattern or character range with another one using the s/// and tr/// functions.
The s/// Function

s/// has two parts: the regular expression and the string to replace it with: s/expression/replacement/.

$h = "Who's afraid of the big bad wolf?";
$i = "He had a wife.";

$h =~ s/w.+f/goat/; # yields "Who's afraid of the big bad goat?"
$i =~ s/w.+f/goat/; # yields "He had a goate."

If you extract pattern matches, you can use them in the replacement part of the substitution:

$h = "Who's afraid of the big bad wolf?";

$h =~ s/(\w+) (\w+) wolf/$2 $1 wolf/;
# yields "Who's afraid of the bad big wolf?"

Default Substitution Variable

If you don't bind a variable with =~, then s/// operates on $_ just as the match does.
Using a Variable in the Substitution Part

Yes you can:

$h = "Who's afraid of the big bad wolf?";
$animal = 'hyena';
$h =~ s/(\w+) (\w+) wolf/$2 $1 $animal/;
# yields "Who's afraid of the bad big hyena?"

Using Different Delimiters

The s/// function can use alternative delimiters, including parentheses and bracket pairs. For example:

$h = "Who's afraid of the big bad wolf?";

$h =~ s!(\w+) (\w+) wolf!$2 $1 wolf!; # using ! as delimiter

$h =~ s{(\w+) (\w+) wolf}{$2 $1 wolf}; # using {} as delimiter

Translating Character Ranges

The tr/// function allows you to translate one set of characters into another. Specify the source set in the first part of the function, and the destination set in the second part:

$h = "Who's afraid of the big bad wolf?";
$h =~ tr/ao/AO/; # yields "WhO's AfrAid Of the big bAd wOlf?";

Like s///, the tr/// function operates on $_ if not otherwise specified.

tr/// returns the number of characters transformed, which is sometimes handy for counting the number of a particular character without actually changing the string.

This example counts N's in a series of DNA sequences:

Code:


while (<>) {
chomp; # assume one sequence per line
my $count = tr/Nn/Nn/;
print "Sequence $_ contains $count Ns\n";
}

Output:

(~) 50% count_Ns.pl sequence_list.txt
Sequence 1 contains 0 Ns
Sequence 2 contains 3 Ns
Sequence 3 contains 1 Ns
Sequence 4 contains 0 Ns
...


Regular Expression Options

Regular expression matches and substitutions have a whole set of options which you can toggle on by appending one or more of the i, m, s, g, e or x modifiers to the end of the operation. See Programming Perl Page 153 for more information. Some example:

$string = 'Big Bad WOLF!';
print "There's a wolf in the closet!" if $string =~ /wolf/i;
# i is used for a case insensitive match

i Case insensitive match.

g Global match (see below).

e Evalute right side of s/// as an expression.

o Only compile variable patterns once (see below).

m Treat string as multiple lines. ^ and $ will match at start and end of internal lines, as well as at beginning and end of whole string. Use \A and \Z to match beginning and end of whole string when this is turned on.

s Treat string as a single line. "." will match any character at all, including newline.

x Allow extra whitespace and comments in pattern.
Global Matches

Adding the g modifier to the pattern causes the match to be global. Called in a scalar context (such as an if or while statement), it will match as many times as it can.

This will match all codons in a DNA sequence, printing them out on separate lines:

Code:

$sequence = 'GTTGCCTGAAATGGCGGAACCTTGAA';
while ( $sequence =~ /(.{3})/g ) {
print $1,"\n";
}

Output:

GTT
GCC
TGA
AAT
GGC
GGA
ACC
TTG

If you perform a global match in a list context (e.g. assign its result to an array), then you get a list of all the subpatterns that matched from left to right. This code fragment gets arrays of codons in three reading frames:

@frame1 = $sequence =~ /(.{3})/g;
@frame2 = substr($sequence,1) =~ /(.{3})/g;
@frame3 = substr($sequence,2) =~ /(.{3})/g;

The position of the most recent match can be determined by using the pos function.
Code:

#file:pos.pl
my $seq = "XXGGATCCXX";

if ( $seq =~ /(GGATCC)/gi ){
my $pos = pos($seq);
print "Our Sequence: $seq\n";
print '$pos = ', "1st postion after the match: $pos\n";
print '$pos - length($1) = 1st postion of the match: ',($pos-length($1)),"\n";
print '($pos - length($1))-1 = 1st postion before the the match: ',($pos-length($1)-1),"\n";
}

Output:

~]$ ./pos.pl
Our Sequence: XXGGATCCXX
$pos = 1st postion after the match: 8
$pos - length($&) = 1st postion of the match: 2
($pos - length($&))-1 = 1st postion before the the match: 1

Variable Interpolation and the "o" Modifier

If you use a variable inside a pattern template, as in /$pattern/ be aware that there is a small performance penalty each time Perl encounters a pattern it hasn't seen before. If $pattern doesn't change over the life of the program, then use the o ("once") modifier to tell Perl that the variable won't change. The program will run faster:

$codon = '.{3}';
@frame1 = $sequence =~ /($codon)/og;

Testings Your Regular Expressions

To be sure that you are getting what you think you want you can use the following "Magic" Perl Automatic Match Variables $&, $`, and $'
Code:

#file:matchTest.pl

if ("Hello there, neighbor" =~ /\s(\w+),/){
print "That actually matched '$&'.\n";
print "That was ($`) ($&) ($').\n";
}

Output:

That actually matched ' there,'.
That was (Hello) ( there,) ( neighbor).


Regular Expression Options

Regular expression matches and substitutions have a whole set of options which you can toggle on by appending one or more of the i, m, s, g, e or x modifiers to the end of the operation. See Programming Perl Page 153 for more information. Some example:

$string = 'Big Bad WOLF!';
print "There's a wolf in the closet!" if $string =~ /wolf/i;
# i is used for a case insensitive match

i Case insensitive match.

g Global match (see below).

e Evalute right side of s/// as an expression.

o Only compile variable patterns once (see below).

m Treat string as multiple lines. ^ and $ will match at start and end of internal lines, as well as at beginning and end of whole string. Use \A and \Z to match beginning and end of whole string when this is turned on.

s Treat string as a single line. "." will match any character at all, including newline.

x Allow extra whitespace and comments in pattern.
Global Matches

Adding the g modifier to the pattern causes the match to be global. Called in a scalar context (such as an if or while statement), it will match as many times as it can.

This will match all codons in a DNA sequence, printing them out on separate lines:

Code:

$sequence = 'GTTGCCTGAAATGGCGGAACCTTGAA';
while ( $sequence =~ /(.{3})/g ) {
print $1,"\n";
}

Output:

GTT
GCC
TGA
AAT
GGC
GGA
ACC
TTG

If you perform a global match in a list context (e.g. assign its result to an array), then you get a list of all the subpatterns that matched from left to right. This code fragment gets arrays of codons in three reading frames:

@frame1 = $sequence =~ /(.{3})/g;
@frame2 = substr($sequence,1) =~ /(.{3})/g;
@frame3 = substr($sequence,2) =~ /(.{3})/g;

The position of the most recent match can be determined by using the pos function.
Code:

#file:pos.pl
my $seq = "XXGGATCCXX";

if ( $seq =~ /(GGATCC)/gi ){
my $pos = pos($seq);
print "Our Sequence: $seq\n";
print '$pos = ', "1st postion after the match: $pos\n";
print '$pos - length($1) = 1st postion of the match: ',($pos-length($1)),"\n";
print '($pos - length($1))-1 = 1st postion before the the match: ',($pos-length($1)-1),"\n";
}

Output:

~]$ ./pos.pl
Our Sequence: XXGGATCCXX
$pos = 1st postion after the match: 8
$pos - length($&) = 1st postion of the match: 2
($pos - length($&))-1 = 1st postion before the the match: 1

Variable Interpolation and the "o" Modifier

If you use a variable inside a pattern template, as in /$pattern/ be aware that there is a small performance penalty each time Perl encounters a pattern it hasn't seen before. If $pattern doesn't change over the life of the program, then use the o ("once") modifier to tell Perl that the variable won't change. The program will run faster:

$codon = '.{3}';
@frame1 = $sequence =~ /($codon)/og;

Testings Your Regular Expressions

To be sure that you are getting what you think you want you can use the following "Magic" Perl Automatic Match Variables $&, $`, and $'
Code:

#file:matchTest.pl

if ("Hello there, neighbor" =~ /\s(\w+),/){
print "That actually matched '$&'.\n";
print "That was ($`) ($&) ($').\n";
}

Output:

That actually matched ' there,'.
That was (Hello) ( there,) ( neighbor).

Learn perl easy part3

An Array Is a List of Values

For example a list with the number 3.14 as the first element, the string 'abA' as the second element, and the number 65065 as the third element.

"Literal Representation"

We write the list as above as
(3.14, 'abA', 65065)

If $pi = 3.14 and $s = 'abA' we can also write
($pi, $s, 65065)

We can also do integer ranges:
(-1..5)

shorthand for
(-1, 0, 1, 2, 3, 4, 5)

Counting down not allowed!

Array Variables and Assignment

my $x = 65065;

my @x = ($pi, 'abA', $x);
my @y = (-1..5);
my @z = ($x, $pi, @x, @y);
my ($first, @rest) = @z;

Getting at Array Elements

$z[0]      # 65065

$z[0] = 2;
$z[0] # 2
$z[$#z]; # 5
Skip "slices" for now.

Push, Pop, Shift, Unshift

Add 9 to the end of @z;
push @z, 9;

Take the 9 off the end of @z, and then take the 5 off the end:
my $end1 = pop @z;

my $end2 = pop @z;
Add 9 to the beginning of @z;
unshift @z, 9;

Take the 9 off the beginning of @z, and then take the 3.14 off the beginning:
my $b1 = shift @z;

my $b2 = shift @z;

Reverse

my @zr = reverse @z;

Sorting

Alphabetically:
my @zs = sort @z;

Numerically:
my @q = sort { $a <=> $b } (-1, 3, -20)

Split and Join

my @q = split /\d+/, 'abd1234deff0exx'

# ('abd', 'deff', 'exx');

Swallowing Whole Files in a Single Gulp

my @i = <>;

chomp @i;

Array and Scalar Context

The notion of array and scalar context is unique to perl. Usually you can remain unaware of it, but it comes up in reverse, and can be used to get the size of an array.
print reverse 'ab'; # prints ab!!! (reverse in array context)

$ba = reverse 'ab'; # $ba contains 'ba' (reverse in scalar context)
print scalar reverse 'ab'; # prints ba
print scalar @z; # print the size of @z

A Hash Is a Lookup Table

A hash is a lookup table. We use a key to find an associated value.
my %translate;

$translate{'atg'} = 'M';
$translate{'taa'} = '*';
$translate{'ctt'} = 'K'; # oops
$translate{'ctt'} = 'L'; # fixed
print $translate{'atg'};

Getting All Keys

keys %translate

Removing Key, Value Pairs

delete $translate{'taa'};

keys %translate;

Initializing From a List

%translate = ( 'atg' => 'M',

'taa' => '*',
'ctt' => 'L',
'cct' => 'P', );


Basic Loops

Loops let you execute the same statements over and over again.


while Loops

A while loop has a condition at the top. The code within the body will execute until the code becomes false.

 

while ( TEST ) {
Code to execute
} continue {
Optional code to execute at the end of each loop
}

Example: Count the number of times "potato" appears in a list

Code:

  #!/usr/local/bin/perl

# file: spud_counter.pl

$count = 0;

while ( $word = shift ) { # read from command line
if ($word eq 'potato') {
print "Found a potato!\n";
$count++;
} else {
print "$word is not a potato\n";
}
}

print "Potato count: $count\n";

Output:

(~) 51% spud_counter.pl potato potato tomato potato boysenberry

Found a potato!
Found a potato!
tomato is not a potato
Found a potato!
boysenberry is not a potato
Potato count: 3

Another Example: Count Upward from 1 to 5

Code:

  #!/usr/local/bin/perl

# file: count_up.pl

$count = 1;
while ( $count <= 5 ) {
print "count: $count\n";
$count++;
}

Output:

(~) 51% count_up.pl

count: 1
count: 2
count: 3
count: 4
count: 5

Yet Another Example: Count Down from 5 to 1

Code:

  #!/usr/local/bin/perl

# file: count_down.pl

$count = 6;
while ( --$count > 0 ) {
print "count: $count\n";
}

Output:

(~) 51% count_down.pl

count: 5
count: 4
count: 3
count: 2
count: 1

The continue Block

while loops can have an optional continue block containing code that is executed at the end of each loop, just before jumping back to the test at the top:

  #!/usr/local/bin/perl

# file: count_up.pl

$count = 1;
while ( $count <= 5 ) {
print "count: $count\n";
} continue {
$count++;
}

continue blocks will make more sense after we consider loop control variables.


The until Loop

Sometimes you want to loop until some condition becomes true, rather than until some condition becomes false. The until loop is easier to read than the equivalent while (!TEST).

  my $counter = 5;

until ( $counter < 0 ) {
print $counter--,"\n";
}

foreach Loops

foreach will process each element of an array or list:

 

foreach $loop_variable ('item1','item2','item3') {
print $loop_variable,"\n";
}

@array = ('item1','item2','item3');
foreach $loop_variable (@array) { # same thing, but with an array
print $loop_variable,"\n";
}

@array = ('item1','item2','item3');
foreach (@array) { # same difference
print $_,"\n";
}

The last example is interesting. It shows that if you don't explicitly give foreach a loop variable, the special scalar variable $_ is used.

Changing Values with the foreach Loop

If you modify the loop variable in a foreach loop, the underlying array value will change!

Code:

  @h = (1..5);  # make an array containing numbers between 1 and 5

foreach $variable (@h) {
$variable .= ' potato';
}

print join("\n",@h),"\n";

Output:

1 potato

2 potato
3 potato
4 potato
5 potato

This works with the automatic $_ variable too:

Code:

  @h = ('CCCTTT','AAAACCCC','GAGAGAGA');

foreach (@h) {
($_ = reverse $_) =~ tr/GATC/CTAG/;
print "$_\n";
}

Advanced Loops

The for Loop

Consider the standard while loop:

  initialization code

while ( Test code ) {
Code to execute in body
} continue {
Update code
}

This can be generalized into the concise for loop:

 

for ( initialization code; test code; update code ) {
body code
}

When the loop is first entered, the code at initialization is executed. Each time through the loop, the test at test is executed and the loop stops if it returns false. After the execution of each loop, the code at update is performed.

Compare the process of counting from 1 to 5:

  # with a while loop

$count = 1;
while ( $count <= 5 ) {
print $count,"\n";
} continue {
$count++;
}

# with a for loop
for ( my $count=1; $count<=5; $count++ ) {
print $count,"\n";
}

Notice how we use my to make $count local to the for loop.

Fancy for() Loops

Any of the three for components are optional. You can even leave them all off to get an infinite loop:

  for (;;) {

print "Somebody help me! I can't stop!\n";
}

# equivalent to:
while (1) {
print "Somebody help me! I can't stop!\n";
}

Any of the components can be a list. This is usually used to initialize several variables at once:

  # read until the "end" line or 10 lines, whichever

# comes first....
for (my $done=0,my $i=1; $i<10 and !$done; $i++) {
my $line = ;
chomp $line;
$done++ if $line eq 'end';
}

Loop Control

The next, last, and redo statements allow you to change the flow of control in the loop mid-stream, as it were. You can use these three statements in while loops, until loops, and for and foreach loops, but not in the do-until and do-while variants.

next

The next statement causes the rest of the loop to be skipped and control to pass back to the conditional test at the top. If there's a continue block, it is executed before control returns to the top of the loop.

  $done = 0;

while (!$done) {
$line = ;
chomp $line;
next if $line eq 'SKIP';
print $line,"\n";
} continue {
$done++ if $line eq 'END';
}

last

The last statement causes the loop to terminate prematurely, even if the loop conditional is still true:

  while ( $line =  ) {

chomp $line;
last if $line eq 'END';
print $line,"\n";
}

redo

The redo statement is rarely used. It causes flow of control to jump to the top of the loop, like next. However, the continue block, if any, is not executed. In a for loop, the update expression is not executed.

  for (my $i=0; $i<10; $i++) {

chomp ($line = );
redo if $line eq 'SKIP'; # $i won't get incremented in this case
print "Read line $i\n";
}

Nested Loops

If you have two or more nested loops, next, last and redo always apply to the innermost loop. You can change this by explicitly labeling the loop block and referring to the label in the loop control statement:

 XLOOP:

for (my $x=0; $x<10; $x++) {
for (my $y=0; $i<100; $y++) {
next XLOOP unless $array[$x][$y] > 0;
print "($x,$y) = $array[$x][$y]\n";
}
}

Basic I/O

I/O means input/output, and is necessary to get computer programs to talk to the rest of the world.

The STDIN, STDOUT and STDERR Filehandles

Every Perl scripts starts out with three connections to the outside world:

STDIN Standard input, used to read input. Initially connected to the keyboard, but can be changed from shell using redirection (<) or pipe (|).

STDOUT Standard output, used to write data out. Initially connected to the terminal, but can be redirected to a file or other program from the shell using redirection or pipes.

STDERR Standard error, used for diagnostic messages. Initially connected to the terminal, etc.

In addition to these three filehandles, you can create your own.

Reading Data from STDIN

To read a line of data into your program use the angle bracket function:

 $line = 

will read one line of input from standard input and return it as the function result. You usually will assign the result to a scalar variable. The newline is not removed line automatically; you have to do that yourself with chomp:

 print "Type your name: ";

$name =
chomp $name;
if ($name eq 'Jim Watson') {
print "Hail great master!";
else {
print "Hello $name\n";
}

The read/chomp sequence is often abbreviated as:

chomp($name = );

The Input Loop

At the "end of file" (or when the user presses ^D to end input) will return whatever's left, which may or may not include a newline. Thereafter, will return the undefined value.

This leads typical input loop:

 while ( $line =  ) {

chomp $line;
# now do something with $line...
}

The while loop will read one line of text after another. At the end of input, the angle-bracket operator returns undef and the while loop terminates. Remember that even blank lines are TRUE, because they consist of a single newline character.

The Default Input Variable

If you don't assign the result of the angle-bracket operator to a scalar variable, it will default to the special scalar variable $_. This scalar is the default for a number of other functions, including chomp and the regular expression match.

This example prepends the line number to its input.

Code:

 #!/usr/local/bin/perl

# file: add_line_numbers.pl

$line_number = 0;
while ( ) {
chomp;
print $line_number++,": ",$_,"\n";
}

Output:

(~) 50% add_line_numbers.pl 

0: Gabor Marth gmarth@watson.wustl.edu
1: Genome Sequencing Center
2: Washington University School of Medicine
3: 4444 Forest Park Blvd.
4: St. Louis, MO 63108
5: 314 286-1839
6: 314 286-1810 (fax)
7: Dates: Oct 17-23
8:
9: Sean Eddy eddy@genetics.wustl.edu
10: Assistant professor
11: Department of Genetics
12: Washington University School of Medicine
13: 660 S. Euclid Ave.
14: St. Louis, Mo. 63110
15: 314 362-7666
16: 314 362-7855 (fax)
17: Dates: Oct 20-22
18:
19: Warren Gish gish@sapiens.wustl.edu
...

Assigning to an Array

Normally you assign the angle-bracket function to a scalar variable, getting a line of input. What if you assign to an array? You get all the lines from the input file or terminal, one per array element!!!

It is convenient to pass this array to chomp, which will remove the newline from each member of the array.

 @lines =   # get all lines

chomp @lines; # remove all newlines

Or you can do both things in one elegant operation:

chomp(@lines = );


Output

The print function writes data to output. In its full form, it takes a filehandle as its first argument, followed by a list of scalars to print:

print FILEHANDLE $data1,$data2,$data3,...

Notice there is no comma between FILEHANDLE and the data arguments. If FILEHANDLE is omitted it defaults to STDOUT (this can be changed). So these are equivalent:

print STDOUT "Hello world\n";

print "Hello world\n";
To print to standard error:
print STDERR "Does not compute.\n";