Thursday, August 20, 2009

Learn perl easy part3

An Array Is a List of Values

For example a list with the number 3.14 as the first element, the string 'abA' as the second element, and the number 65065 as the third element.

"Literal Representation"

We write the list as above as
(3.14, 'abA', 65065)

If $pi = 3.14 and $s = 'abA' we can also write
($pi, $s, 65065)

We can also do integer ranges:
(-1..5)

shorthand for
(-1, 0, 1, 2, 3, 4, 5)

Counting down not allowed!

Array Variables and Assignment

my $x = 65065;

my @x = ($pi, 'abA', $x);
my @y = (-1..5);
my @z = ($x, $pi, @x, @y);
my ($first, @rest) = @z;

Getting at Array Elements

$z[0]      # 65065

$z[0] = 2;
$z[0] # 2
$z[$#z]; # 5
Skip "slices" for now.

Push, Pop, Shift, Unshift

Add 9 to the end of @z;
push @z, 9;

Take the 9 off the end of @z, and then take the 5 off the end:
my $end1 = pop @z;

my $end2 = pop @z;
Add 9 to the beginning of @z;
unshift @z, 9;

Take the 9 off the beginning of @z, and then take the 3.14 off the beginning:
my $b1 = shift @z;

my $b2 = shift @z;

Reverse

my @zr = reverse @z;

Sorting

Alphabetically:
my @zs = sort @z;

Numerically:
my @q = sort { $a <=> $b } (-1, 3, -20)

Split and Join

my @q = split /\d+/, 'abd1234deff0exx'

# ('abd', 'deff', 'exx');

Swallowing Whole Files in a Single Gulp

my @i = <>;

chomp @i;

Array and Scalar Context

The notion of array and scalar context is unique to perl. Usually you can remain unaware of it, but it comes up in reverse, and can be used to get the size of an array.
print reverse 'ab'; # prints ab!!! (reverse in array context)

$ba = reverse 'ab'; # $ba contains 'ba' (reverse in scalar context)
print scalar reverse 'ab'; # prints ba
print scalar @z; # print the size of @z

A Hash Is a Lookup Table

A hash is a lookup table. We use a key to find an associated value.
my %translate;

$translate{'atg'} = 'M';
$translate{'taa'} = '*';
$translate{'ctt'} = 'K'; # oops
$translate{'ctt'} = 'L'; # fixed
print $translate{'atg'};

Getting All Keys

keys %translate

Removing Key, Value Pairs

delete $translate{'taa'};

keys %translate;

Initializing From a List

%translate = ( 'atg' => 'M',

'taa' => '*',
'ctt' => 'L',
'cct' => 'P', );


Basic Loops

Loops let you execute the same statements over and over again.


while Loops

A while loop has a condition at the top. The code within the body will execute until the code becomes false.

 

while ( TEST ) {
Code to execute
} continue {
Optional code to execute at the end of each loop
}

Example: Count the number of times "potato" appears in a list

Code:

  #!/usr/local/bin/perl

# file: spud_counter.pl

$count = 0;

while ( $word = shift ) { # read from command line
if ($word eq 'potato') {
print "Found a potato!\n";
$count++;
} else {
print "$word is not a potato\n";
}
}

print "Potato count: $count\n";

Output:

(~) 51% spud_counter.pl potato potato tomato potato boysenberry

Found a potato!
Found a potato!
tomato is not a potato
Found a potato!
boysenberry is not a potato
Potato count: 3

Another Example: Count Upward from 1 to 5

Code:

  #!/usr/local/bin/perl

# file: count_up.pl

$count = 1;
while ( $count <= 5 ) {
print "count: $count\n";
$count++;
}

Output:

(~) 51% count_up.pl

count: 1
count: 2
count: 3
count: 4
count: 5

Yet Another Example: Count Down from 5 to 1

Code:

  #!/usr/local/bin/perl

# file: count_down.pl

$count = 6;
while ( --$count > 0 ) {
print "count: $count\n";
}

Output:

(~) 51% count_down.pl

count: 5
count: 4
count: 3
count: 2
count: 1

The continue Block

while loops can have an optional continue block containing code that is executed at the end of each loop, just before jumping back to the test at the top:

  #!/usr/local/bin/perl

# file: count_up.pl

$count = 1;
while ( $count <= 5 ) {
print "count: $count\n";
} continue {
$count++;
}

continue blocks will make more sense after we consider loop control variables.


The until Loop

Sometimes you want to loop until some condition becomes true, rather than until some condition becomes false. The until loop is easier to read than the equivalent while (!TEST).

  my $counter = 5;

until ( $counter < 0 ) {
print $counter--,"\n";
}

foreach Loops

foreach will process each element of an array or list:

 

foreach $loop_variable ('item1','item2','item3') {
print $loop_variable,"\n";
}

@array = ('item1','item2','item3');
foreach $loop_variable (@array) { # same thing, but with an array
print $loop_variable,"\n";
}

@array = ('item1','item2','item3');
foreach (@array) { # same difference
print $_,"\n";
}

The last example is interesting. It shows that if you don't explicitly give foreach a loop variable, the special scalar variable $_ is used.

Changing Values with the foreach Loop

If you modify the loop variable in a foreach loop, the underlying array value will change!

Code:

  @h = (1..5);  # make an array containing numbers between 1 and 5

foreach $variable (@h) {
$variable .= ' potato';
}

print join("\n",@h),"\n";

Output:

1 potato

2 potato
3 potato
4 potato
5 potato

This works with the automatic $_ variable too:

Code:

  @h = ('CCCTTT','AAAACCCC','GAGAGAGA');

foreach (@h) {
($_ = reverse $_) =~ tr/GATC/CTAG/;
print "$_\n";
}

Advanced Loops

The for Loop

Consider the standard while loop:

  initialization code

while ( Test code ) {
Code to execute in body
} continue {
Update code
}

This can be generalized into the concise for loop:

 

for ( initialization code; test code; update code ) {
body code
}

When the loop is first entered, the code at initialization is executed. Each time through the loop, the test at test is executed and the loop stops if it returns false. After the execution of each loop, the code at update is performed.

Compare the process of counting from 1 to 5:

  # with a while loop

$count = 1;
while ( $count <= 5 ) {
print $count,"\n";
} continue {
$count++;
}

# with a for loop
for ( my $count=1; $count<=5; $count++ ) {
print $count,"\n";
}

Notice how we use my to make $count local to the for loop.

Fancy for() Loops

Any of the three for components are optional. You can even leave them all off to get an infinite loop:

  for (;;) {

print "Somebody help me! I can't stop!\n";
}

# equivalent to:
while (1) {
print "Somebody help me! I can't stop!\n";
}

Any of the components can be a list. This is usually used to initialize several variables at once:

  # read until the "end" line or 10 lines, whichever

# comes first....
for (my $done=0,my $i=1; $i<10 and !$done; $i++) {
my $line = ;
chomp $line;
$done++ if $line eq 'end';
}

Loop Control

The next, last, and redo statements allow you to change the flow of control in the loop mid-stream, as it were. You can use these three statements in while loops, until loops, and for and foreach loops, but not in the do-until and do-while variants.

next

The next statement causes the rest of the loop to be skipped and control to pass back to the conditional test at the top. If there's a continue block, it is executed before control returns to the top of the loop.

  $done = 0;

while (!$done) {
$line = ;
chomp $line;
next if $line eq 'SKIP';
print $line,"\n";
} continue {
$done++ if $line eq 'END';
}

last

The last statement causes the loop to terminate prematurely, even if the loop conditional is still true:

  while ( $line =  ) {

chomp $line;
last if $line eq 'END';
print $line,"\n";
}

redo

The redo statement is rarely used. It causes flow of control to jump to the top of the loop, like next. However, the continue block, if any, is not executed. In a for loop, the update expression is not executed.

  for (my $i=0; $i<10; $i++) {

chomp ($line = );
redo if $line eq 'SKIP'; # $i won't get incremented in this case
print "Read line $i\n";
}

Nested Loops

If you have two or more nested loops, next, last and redo always apply to the innermost loop. You can change this by explicitly labeling the loop block and referring to the label in the loop control statement:

 XLOOP:

for (my $x=0; $x<10; $x++) {
for (my $y=0; $i<100; $y++) {
next XLOOP unless $array[$x][$y] > 0;
print "($x,$y) = $array[$x][$y]\n";
}
}

Basic I/O

I/O means input/output, and is necessary to get computer programs to talk to the rest of the world.

The STDIN, STDOUT and STDERR Filehandles

Every Perl scripts starts out with three connections to the outside world:

STDIN Standard input, used to read input. Initially connected to the keyboard, but can be changed from shell using redirection (<) or pipe (|).

STDOUT Standard output, used to write data out. Initially connected to the terminal, but can be redirected to a file or other program from the shell using redirection or pipes.

STDERR Standard error, used for diagnostic messages. Initially connected to the terminal, etc.

In addition to these three filehandles, you can create your own.

Reading Data from STDIN

To read a line of data into your program use the angle bracket function:

 $line = 

will read one line of input from standard input and return it as the function result. You usually will assign the result to a scalar variable. The newline is not removed line automatically; you have to do that yourself with chomp:

 print "Type your name: ";

$name =
chomp $name;
if ($name eq 'Jim Watson') {
print "Hail great master!";
else {
print "Hello $name\n";
}

The read/chomp sequence is often abbreviated as:

chomp($name = );

The Input Loop

At the "end of file" (or when the user presses ^D to end input) will return whatever's left, which may or may not include a newline. Thereafter, will return the undefined value.

This leads typical input loop:

 while ( $line =  ) {

chomp $line;
# now do something with $line...
}

The while loop will read one line of text after another. At the end of input, the angle-bracket operator returns undef and the while loop terminates. Remember that even blank lines are TRUE, because they consist of a single newline character.

The Default Input Variable

If you don't assign the result of the angle-bracket operator to a scalar variable, it will default to the special scalar variable $_. This scalar is the default for a number of other functions, including chomp and the regular expression match.

This example prepends the line number to its input.

Code:

 #!/usr/local/bin/perl

# file: add_line_numbers.pl

$line_number = 0;
while ( ) {
chomp;
print $line_number++,": ",$_,"\n";
}

Output:

(~) 50% add_line_numbers.pl 

0: Gabor Marth gmarth@watson.wustl.edu
1: Genome Sequencing Center
2: Washington University School of Medicine
3: 4444 Forest Park Blvd.
4: St. Louis, MO 63108
5: 314 286-1839
6: 314 286-1810 (fax)
7: Dates: Oct 17-23
8:
9: Sean Eddy eddy@genetics.wustl.edu
10: Assistant professor
11: Department of Genetics
12: Washington University School of Medicine
13: 660 S. Euclid Ave.
14: St. Louis, Mo. 63110
15: 314 362-7666
16: 314 362-7855 (fax)
17: Dates: Oct 20-22
18:
19: Warren Gish gish@sapiens.wustl.edu
...

Assigning to an Array

Normally you assign the angle-bracket function to a scalar variable, getting a line of input. What if you assign to an array? You get all the lines from the input file or terminal, one per array element!!!

It is convenient to pass this array to chomp, which will remove the newline from each member of the array.

 @lines =   # get all lines

chomp @lines; # remove all newlines

Or you can do both things in one elegant operation:

chomp(@lines = );


Output

The print function writes data to output. In its full form, it takes a filehandle as its first argument, followed by a list of scalars to print:

print FILEHANDLE $data1,$data2,$data3,...

Notice there is no comma between FILEHANDLE and the data arguments. If FILEHANDLE is omitted it defaults to STDOUT (this can be changed). So these are equivalent:

print STDOUT "Hello world\n";

print "Hello world\n";
To print to standard error:
print STDERR "Does not compute.\n";

No comments: