There are many situations you may want to process a text file paragraph by paragraph ...
One such example was this, I wanted to delete those paragraphs from a text file that had a particular pattern. Like, delete all paragraphs that has text like 'copyright protected by blah blah'.
First thing is to learn how to read a text file paragraph by paragraph, for that we will see how to open a file (named web_extract.txt):
open (FILE, "web_extract.txt") or die "Unable to open web_extract.txt: $!\n";
This is how you can read the opened file line by line:
while(my $line = <FILE> ) {
.... do something ...
}
But to read a file in paragraph mode, you have to reset (zero) the special variable $/, look at the code below:
{
local $/ = '';
@paragraphs = <FILE>;
chomp @paragraphs;
}
So this will read the opened file in paragraph mode.
Don't worry about the block, it is used to localize resetting the $/ variable.
Now the variable @paragraphs has paragraphs as its elements. So you can loop around this variable and push the elements (to @filtered_paragraphs) that do not match your pattern. Then print that new array (@filtered_paragraphs) to the (same/another) file.
Done!!!
Subscribe to:
Post Comments (Atom)



0 comments:
Post a Comment