Subscribe Add to Technorati Favorites

Friday, July 25, 2008

Regular Expressions (more)

There are many more options that can be used with PERL regular expressions.

The different options that I have used are: g,i,s,m,o,x
These options are to be specified like this:
/reg-exp/options
One or more options can be used, but one cant' use s & m simultaneously.

So here are the meanings of these regular expressions ...

g -> Match Globally:
This option can be used when you are trying to match a pattern in the whole string. As in this example :
$you_appeared++ while(/\s+you\s+/g);

i -> Ignore Case:
Use this when case of the pattern does not matter.

s -> Treat Workspace As Single Line:
This option is used when you need to match "\n" with reg-exp special character '.'. Or put simple, you this option when the work space has "\n"'s and you want to match something considering "\n".
my $workspace = "I'am sure you are confused\nYou have to use it to understand it\n";
if ($workspace =~ /use/s) {
print "$&$'\n";
}

m -> Treat workspace as multiple lines. This option differs from above, here '.' does not match a "\n" however, treats workspace as multiple lines.

o -> compile the pattern once. The effect this produces is visible when you use a variable. Since the pattern is compiled only once, the variable will be replaced only once in the pattern - no matter how many times the pattern is used.

x -> add comments inside the regexp, using '#'


Monday, July 21, 2008

Reading ... Processing file paragraph by paragraph

There are many situations you may want to process a text file paragraph by paragraph ...

One such example was this, I wanted to delete those paragraphs from a text file that had a particular pattern. Like, delete all paragraphs that has text like 'copyright protected by blah blah'.

First thing is to learn how to read a text file paragraph by paragraph, for that we will see how to open a file (named web_extract.txt):
open (FILE, "web_extract.txt") or die "Unable to open web_extract.txt: $!\n";

This is how you can read the opened file line by line:
while(my $line = <FILE> ) {
.... do something ...
}


But to read a file in paragraph mode, you have to reset (zero) the special variable $/, look at the code below:
{
local $/ = '';
@paragraphs = <FILE>;
chomp @paragraphs;
}


So this will read the opened file in paragraph mode.

Don't worry about the block, it is used to localize resetting the $/ variable.

Now the variable @paragraphs has paragraphs as its elements. So you can loop around this variable and push the elements (to @filtered_paragraphs) that do not match your pattern. Then print that new array (@filtered_paragraphs) to the (same/another) file.

Done!!!

To check the lists of file created between X-Y minutes

I have seen many questions on this ->

how to get a list of files that were created/modified between 30 to 60 minutes? or 1 to 2 hours ... etc.

Here is a simple PERL script to do this job:

use strict;
use File::Find;

my $path_to_start = $ARGV[0];
my @files_bet_30_60 = ();
my $thirty_mins_in_secs = 30*60; # change the value of this var to decrease the lower limit
my $sixty_mins_in_secs = 60*60; # change the value if this var to increase the upper limit

finddepth(\&wanted, $path_to_start);
foreach my $file (@files_bet_30_60) {
print "$file\n";
}

sub wanted {
my $file = $File::Find::name;
next unless (-f $file);

my $mtime = (stat($file))[9];
my $time_diff = time - $mtime;
push @files_bet_30_60, $file if (($time_diff > $thirty_mins_in_secs) && ($time_diff < $sixty_mins_in_secs)); }
Also following shell command can do this:
find /path/to/files -type f \( -newer min60 -a ! -newer min30 \)

Thursday, July 17, 2008

Video: PERL tutorial that I found on YouTube

Video Playlist: PERL tutorials that I found on YouTube

Wednesday, July 16, 2008

PERL script to check if a process is alive

Below i have a perl script that checks if a given process (passed as command-line argument) is alive or not. If the process had died for some reason, it re-invokes it.

Put this in crontab & schedule it to run every 1 hour or so.
This will take to keep your ever running processes alive.

The script:

#!/usr/bin/perl


my $process_to_check = $ARGV[0] or die "Usage: $0 \n";

open(PS,"/bin/ps -aef|") || die "Can't Open PS: $!\n";

while() {
chomp;
if (/\Q$process_to_check\E/) { close PS; exit;}
}

close PS;
system("$process_to_check");

Collect files that were created/modified in last 30 to 60 minutes

I actually wrote this script in reply to a post.

This is what it does ...
Given a path (as command-line argument), the script will search through the file system & lists the files that were created/modified in last 30-60 minutes.

The script:

use strict;
use File::Find;

my $path_to_start = $ARGV[0];
my @files_bet_30_60 = ();
my $thirty_mins_in_secs = 30*60;
my $sixty_mins_in_secs = 60*60;

finddepth(\&wanted, $path_to_start);
foreach my $file (@files_bet_30_60) {
print "$file\n";
}

sub wanted {
my $file = $File::Find::name;
next unless (-f $file);

my $mtime = (stat($file))[9];
my $time_diff = time - $mtime;
push @files_bet_30_60, $file if
(($time_diff > $thirty_mins_in_secs) && ($time_diff < $sixty_mins_in_secs));
}

Sunday, July 13, 2008

A Little Guide to PERL Regular Expressions

Why RegExp?

Regular Expressions are powerful tools for pattern matching and it is also PERL’s powerful tool.

Starting with Regular Expressions:

Before introducing to Regular Expression, we shall see what is $_ and how it is relevant to regular expressions.
$_ is a PERL special variable which is the default workspace for the regular expression operator.

However, $_ is also the default workspace for many commands in PERL. For example, if you just say print in your script it would print the contents in $_, same with chop, chomp, etc.

Coming back to our topic, we shall start with the pattern match operator of PERL /pattern/

It is as “powerful” as “simple to see”.

/pattern/ can be used if the workspace of regular expression is $_. If you want to match a pattern against a variable then the operator is $variable =~ /pattern/

Let us see some simple regular expressions examples:
1. Match vikas in this line “my name is vikas”:
$variable = “my name is vikas”; # variable to match
$variable =~ /vikas/; # matching the pattern ‘vikas’ in $variable
2. The above example if workspace is $_:
$_ = “my name is vikas”;
/vikas/;

How to use this actually?

These operators can be used with if, while & even foreach constructs. I will explain the usage with an example:

Scenario: [Extracting data by matching a pattern from a log file]

Say you have a log file that saved your chat & it is important to get all your friend’s chat ids. The log is very big … maybe because it logs all sessions in a single file.

With that story told … you decide to use regular expressions to extract that all the chat ids.

Now all that you have to do is to get a pattern from the log, which you will have to match … You open the file & you see this line:

##### Start of chat with xyz@bahoo at 05:00 PM 25-01-08 #####

For every chat above format is repeated. So your pattern can be:
/^\s*#+\s+ Start of chat with\s+(\w+\@\w+)\s+/

This pattern uses many other RegExp anchor characters, repetitions, braces & character classes which I will explain.

So … here are the anchor characters that you can use:
^ Or \A Match beginning of the string/line -- used in my example
$ Or \Z Match end of the string/line
\z End of string in any match mode
\b Match word boundary
\B Match non-word boundary

Here are the repetition characters:
? Zero or one occurrence of the previous item
* Zero or more occurrences of the previous item – used in my example
+ One or more occurrences of the previous item – used in my example
There is more repetition operators, which I do not want to, put up here.

Here are the braces/capturing groups:
(...) Group several characters together for later use or capture as a single unit, and the matched values will be stored in special variables named $1, $2, $3 …

All this said, we will move on to our example & look at the code:

# first open the log file
open (LOG, ”chat. log”) or die “Unable to open chat.log: $!\n”;
my @all_contents = ; # read all contents into an array
close LOG;
foreach my $line (@all_contents) {
if ($line =~ /^\s*#+\s+ Start of chat with\s+(\w+\@\w+)\s+/) {
print ”$1\n”;
}
}

The above script will print all the chat ids that the log contains.
For more about regular expressions visit this link: http://perldoc.perl.org/perlre.html

Thursday, July 10, 2008

About foreach

After a long time...
This post is about the foreach loop construct. There is something special about the foreach loop, that I noticed recently. Consider this script:

my @array = (1,2,3,4,5,6,7,8,9,0);
foreach my $ele (@array) {
print "$ele\n";
}


Yes, this simply prints the array.

The specialty of foreach is that it assigns the reference of each element to $ele. This is true even if you use an array reference in foreach. This means that if you modify an element inside foreach the array is affected.

my @array = (1,2,3,4,5,6,7,8,9,0);
foreach my $ele (@array) {
print "$ele\n";
$ele = '' if ($ele == 5);
}


Consider above script & then print all the array elements, you will notice 5 is gone.

I have discussed with the perlmonks, visit this thread for more info:
http://perlmonks.org/?node_id=696953

Wednesday, July 2, 2008

Always confused with 2>&1

For those confused, I am talking about the stdout & stderr. And redirection operators.

This is more of the commands used in linux terminals.

Consider this command:
tar -cvf filename.tar

If you run this command it prints a lot on stdout (terminal). So if you want to redirect this output to /dev/null you would do that using:
tar -cvf filename.tar 1>/dev/null
(or better)
tar -cvf filename.tar.gz >/dev/null

If you are using tar in a script & want to capture/redirect the stdout of tar in a file, you can use:
tar -cvf filename.tar >tar_stdout.log
(here > symbol is used to redirect)

Now, if there is an error while tarring (say disk full). Then the command will print error on stderr & will exit. So how will you capture the errors, now you want to redirect both stderr & stdout to files.

There are to scenarios here:
1. You want to redirect to two seperate files:
tar -cvf filename.tar 1>tar_stdout.log 2>tar_stderr.log
2. You want to redirect stdout & stderr to a single file:
tar -cvf filename.tar >tar_stds.log 2>&1

Remember always: This applies to all commands:
The arguments are put to a variable (
ARGV) of that process, and this variable is accessed as a stack. In the previous example 2>&1 (instruction to the shell to redirect 2 to 1) should be at the end. So that this (2>&1) redirection is arranged first & then 1>tar_stds.log. Good example is:
mkdir dir1/dir2 dir1 --> Valid
mkdir dir1 dir1/dir2 --> Invalid

Redirection operators include:
> (output redirect)
< (input redirect) >> (output redirect but append)

The same redirect operators can be used to redirect I/O to other file numbers OR to files. The distinction is:
tar -cvf file.tar 1>stdout.log -> this acts as redirect to file
tar -cvf file.tar 1>&2 -> this acts as redirect to another file number(notice the &)

PS:-
I use bash

PERL References


Familiar with C pointers?

In C pointers are variables that contains the location of some other piece of data. That is, it can be a machine address.

In similar lines PERL also supports pointer variables called references.

If you do not understand the pointer concept, here is an illustration (I had read this in a book on 8085, long back):
Take a look at the picture above.

The image is self explanatory, cup c1 analogous to a pointer variable has a piece of paper that has the name of another cup (
address) that has the juice which is cup c2.

Similarly, in PERL consider c1 is the reference to variable c2. These references can point to a scalar/array/hash/a subroutine.

Syntax:
my $scalar_ref = \$scalar_a;
my $array_ref = \@array_a;
my $hash_ref = \%hash_a;
my $sub_ref = \&find_average;

Here the slash (\) operator is used to create references. References can be anonymous too, that is without any name. Anonymous hashes can be used to create complex data structures such as array of hashes, hashes of hashes, etc

Also references can be created to file handles & type globs.

References are just like other scalar variables. Perl also offers a function ref() to check the type of reference a scalar holds. For example:

$ref_type = ref($scalar_ref);

ref() returns (without double-quotes)
"SCALAR" for scalar refs
"ARRAY" for arrays
"HASH" for hash
"CODE" for subroutine
It returns undef for a scalar variable. For more info on this you can visit: http://perldoc.perl.org/functions/ref.html

And for more info on references, visit: http://perldoc.perl.org/perlreftut.html

Tuesday, July 1, 2008

About being 'strict' (use strict)

This post is about using the strict pragma.

I always use this pragma in my scripts. It helps you in keeping the script neat & clean.

As you know that by default variables are created & destroyed lexically in PERL, what I mean is by default (if you do not mention "my", "our" or "local") all variables are lexically scoped. This leads actually to errors when you mis-spell the variable names in your script.

For example:

Consider this script, of adding two numbers:
$variable1 = 10;
$variable2 = 20;
$result = $variable1 + $variable2;

This is a small script & may not have mistakes, but look at the bigger picture.

Suppose you made a mistake in the script, only just a typo:
$variable1 = 10;
$variable2 = 20;
$result = $variable1 + $varaible2;

The results are annoying. It seems like there is a logical bug (considering a bigger script), but no there is a typo. This can be caught, using the pragma strict.

Strict is pretty strict, you have to explicitly mentioned the scope of the variables when you declare them. The above script (the one with mistake) will now become:
use strict;
my $variable1 = 10;
my $variable2 = 20;
my $result = $variable1 + $varaible2;

Try running this script, you will get this error:
"Global symbol "$varaible2" requires explicit package name at test_strict.pl line 4.
Execution of test_strict.pl aborted due to compilation errors."

I think you have noticed the error by now.


Using strict pragma affects the subroutines, references & variables. You can even turnoff strict on these, for example:
no strict "vars";

For more details, goto: perldoc.perl.org/strict.html

Bottom line :-
Always 'use strict' and 'no strict' whereever necessary.