Tuesday, November 18, 2008

Perl Gotchas / Notes

http://www.perl.org/books/beginning-perl/
http://www.tizag.com/perlT/perlarrays.php

Perl interactive shell $ perl -de 42

And from my bioinformatics class

Always use chomp after
Always no comma after OUT ie print OUT "Foo\n";
unshift "Add" @array adds elements to the beginning of the array

Functions: chomp, split, reverse, length, tr, substr
[ppt@bioinf lec04]$ perl -e 'print split /:/,"a:b:c:d"; print "\n";'
abcd
tr = translate $DNA =~ tr/[ACTG]/[TGAC]/;

Perl variables
Scalars: $variable_name
Arrays: @array_name
Hashes: %hash_name
Subroutines: &
Filehandle: FILEHANDLE_NAME

The name must begin with a letter or underscore, and can contain as many letters, numbers or underscores as you like
$__A123; (OK) $12; (BAD)

****Preincrement vs. postincrement
$potatoes = 80; # $potatoes holds 80
$onions = ++$potatoes; # $onions holds 81, # $potatoes holds 81
$parsnips = $potatoes++; # $parsnips holds 81, # $potatoes holds 82

Arrays
$x = 65065;
$pi = 3.14;
@z = ($x, ‘abA’, $pi);
$#z equals the largest index of the array @z, so @z[$#z] returns the last element and $#z+1 returns the number of elements in the array

Push, Pop, Shift, Unshift
At the array end, push (add), pop (remove)
At the array start, unshift (add) unshift @z, 9;, shift (remove) $beginning = shift @z;

my @i = <>;
chomp @i; #remove all newlines

Hash
%translate = ( 'atg' => 'M', 'taa' => '*', 'ctt' => 'L', 'cct' => 'P', );
@keys = keys %translate;
@values = values %translate;

lec05 - Advanced Perl programming
(see bioinf.mbb.sfu.ca / lec05 / lec05.pl)

Filehandles
- open FILEHANDLE, "+<cosmids.fasta" (read and write)
- read "my.in" or "<my.in"
- out ">my.out"
- append ">>my.out" or open(FILEHANDLE, "my.in")

Basic input/output (I/O)
- close: This will also happen automatically when your program ends, or if you reuse the same filehandle name.
- close IN or warn "Errors while closing filehandle: $!";
- file operators: print "Is a directory!\n" if -d '/usr/home';
- command line arguments: $argument = shift;
- the <> operator waits for user input
- chomp() removes trailing newline character
- chop() removes the last character
- STDIN Standard input, read input; STDOUT Standard output, used to write data out; STDERR Standard error, used for diagnostic messages
- unless ($#ARGV + 1 == 1) { die "Need 1 parameter\n"; }

Comparisons
- numeric: ==, !=, <, >, <=, >=
- string: eq, ne, gt, lt, =~ (pattern matching), !~ (not of =~)

Control structures: if and unless
- if ($a < $b) { print "AB\n"; } else { print "A==B\n";
- Truth in Perl, empty, 0, undefined value are all false; everything else is true

Loops
- while ($i < 10) { print $i, "\n"; $i++;} # stops when condition becomes false
- until ($ < 0) { print $i, "\n"; $i--;} # stops when condition becomes true
- foreach $fruit ('apple', 'banana', 'orange') { print $fruit,"\n"; }
- for ($i = 0; $i < 5; $i++) { print $i,", "; } print "\n";
- infinite loop: for (;;) {} and while(1) {}

Regular expression
- A regular expression is a string template against which you can match a piece of text.
- $_ is matched by default; $_ is a special variable for line readin by calling
- $at = 'cat'; if ( $at =~ /At/i ) { print "Found regex\n" }
- $at = 'cat'; if ( $at !~ /bat/ ) { print "Regex not found\n" }
- "." character matches everything except newlines
- Predefined characters, \d digit [0-9], \w word [a-ZA-Z_0-9], \s whitespace [\t\n\r], \S non-white space
- anchors: /^Start/ /End^/ /\bWord\b/
- quantifiers: ? {0,1} like true or false :D; + {1,*}; * {0,*}; {3} exactly three; {4, } at least 4 times
- /goa?t/ matches "goat" and "got". Also any text that contains these words.
- if there's a possibility that the condition can be met, then it will be true, doesn't matter if it satisfies all criteria
eg if ('goo' =~ /go?/) { print "goo matched\n"; }
- grouping: /Who's afraid of the big (bad )?wolf\?/;
# matches "Who's afraid of the big bad wolf?" and
# "Who's afraid of the big wolf?"
- String substitution:
$h = "Who's afraid of the big bad wolf?";
$h =~ s/w.+f/goat/;
# yields "Who's afraid of the big bad goat?"
- Global matches: /g will match as many times as it can
- trick: @frame1 = $sequence =~ /(.{3})/g; # match 3 letters globally and convert scalar to array frame1
- my @col = split(/\t/, $_); my $id = $col[0];

Advanced topics:
Subroutine
- subroutines in Perl: blocks of code that can accept, operate on, and/or return variables
- call: say_hello_to("Dan");
- defn: sub say_hello_to { my $name=shift; }

module
- use File::Basename; dirname('/usr/bin'); basename('/usr/bin/perl');
- $ perldoc File::Basename

BioPerl
- http://bioperl.org
A bioinformatics toolkit for:
- format conversion, report processing, data manipulation, sequence analyses, batch processing

[ppt@bioinf tut10]$ perl -e "use File::Basename; print basename('/usr/bin/perl'); "
perl[ppt@bioinf tut10]$

Test if file exists
$ perl -e 'print "f" if -e "test.pl"'

Concatenate two arrays, slice, push:
$ cat ./list-concat.pl
#!/usr/bin/perl -w

my @a = ('a','b','c');
my @b = (1,2,3);
my @c = (@a, 0, @b);
my @d = ('d','e','f');
push @d, (0, @b);

print "A=@a\n";
print "B=@b\n";
print "C=@c\n";
print "C[1..3]=@c[1..3]\n";
print "d=@d\n";

$ ./list-concat.pl
A=a b c
B=1 2 3
C=a b c 0 1 2 3
C[1..3]=b c 0
d=d e f 0 1 2 3

Slicing arrays:
print "C[1..3]=@c[1..3]\n"; # outputs C[1..3]=b c 0


qw = quote words - given a string of unquoted words separated by spaces, each word is surrounded by quotes eg.

my @qstr = qw (This is the world I live in.); # output qstr=This is the world I live in.

So you don't need to quote each word or put commas between words when using qw

you can do this too

my @bstr = (This, is, the, world, I, live, in,);

but this throws an error because of the period after 'in'

my @bstr = (This, is, the, world, I, live, in.,);

No comments: