Regular Expressions are powerful tools for pattern matching and it is also PERL’s powerful tool.
Starting with Regular Expressions:
Before introducing to Regular Expression, we shall see what is $_ and how it is relevant to regular expressions.
$_ is a PERL special variable which is the default workspace for the regular expression operator.
However, $_ is also the default workspace for many commands in PERL. For example, if you just say print in your script it would print the contents in $_, same with chop, chomp, etc.
Coming back to our topic, we shall start with the pattern match operator of PERL /pattern/
It is as “powerful” as “simple to see”.
/pattern/ can be used if the workspace of regular expression is $_. If you want to match a pattern against a variable then the operator is $variable =~ /pattern/
Let us see some simple regular expressions examples:
1. Match vikas in this line “my name is vikas”:
$variable = “my name is vikas”; # variable to match
$variable =~ /vikas/; # matching the pattern ‘vikas’ in $variable
2. The above example if workspace is $_:
$_ = “my name is vikas”;
/vikas/;
How to use this actually?
These operators can be used with if, while & even foreach constructs. I will explain the usage with an example:
Scenario: [Extracting data by matching a pattern from a log file]
Say you have a log file that saved your chat & it is important to get all your friend’s chat ids. The log is very big … maybe because it logs all sessions in a single file.
With that story told … you decide to use regular expressions to extract that all the chat ids.
Now all that you have to do is to get a pattern from the log, which you will have to match … You open the file & you see this line:
##### Start of chat with xyz@bahoo at 05:00 PM 25-01-08 #####
For every chat above format is repeated. So your pattern can be:
/^\s*#+\s+ Start of chat with\s+(\w+\@\w+)\s+/
This pattern uses many other RegExp anchor characters, repetitions, braces & character classes which I will explain.
So … here are the anchor characters that you can use:
^ Or \A Match beginning of the string/line -- used in my example
$ Or \Z Match end of the string/line
\z End of string in any match mode
\b Match word boundary
\B Match non-word boundary
Here are the repetition characters:
? Zero or one occurrence of the previous item
* Zero or more occurrences of the previous item – used in my example
+ One or more occurrences of the previous item – used in my example
There is more repetition operators, which I do not want to, put up here.
Here are the braces/capturing groups:
(...) Group several characters together for later use or capture as a single unit, and the matched values will be stored in special variables named $1, $2, $3 …
All this said, we will move on to our example & look at the code:
# first open the log file
open (LOG, ”chat. log”) or die “Unable to open chat.log: $!\n”;
my @all_contents =
close LOG;
foreach my $line (@all_contents) {
if ($line =~ /^\s*#+\s+ Start of chat with\s+(\w+\@\w+)\s+/) {
print ”$1\n”;
}
}
The above script will print all the chat ids that the log contains.
For more about regular expressions visit this link: http://perldoc.perl.org/perlre.html
0 comments:
Post a Comment