There will be situations at which the data you store in memory (variables) is huge, & the script cribbs & might throw "Out of memory" error (happens if the available memory is very low) & exit.
This had happened to me when extracting large amount of data & storing it into a data structure. The actual reason for this was not the data structure itself, but because of an array in which I used to read the entire files' contents (~35MB/file). The issue was solved once I started reading the file line-by-line.
That was just a part of the topic I wanted to write here... Actually, in the process of lowering the memory usage I had used some new (at least, its new for me) techniques of storing numerical data & that's what I want to write here.
This is the data structure that I had used:
%hash = {
"key1" => {
"kkey1" => "vvalue1"
...
},
...
};
So that's the pictorial of my data structure. In this I had to store some numbers that had to be paired (there was a relationship b/w them) & also for these numbers I had to calculate the average & median. The number-pairs found to be larger than 999, so I had to use a glue character b/w then number-pair & store as "kkey1" & store the number of occurrences as the "vvalue1".
Since it was average & median to be calculated... I wrote these two neat subroutines:
sub get_average {
my (%input_cnt) = @_;
my $count = 0;
my $sum = 0;
foreach my $input (keys %input_cnt) {
$sum += ($input*$input_cnt{$input});
}
foreach my $cnt (values %input_cnt) {
$count += $cnt;
}
my $average = $sum/$count;
return $average;
}
sub get_median {
my (%inputs_cnt) = @_;
my @sorted = sort { $a <=> $b } keys %inputs_cnt;
my %sorted_vals_index = (); # sorted_vals_index{delay} => 'low_index:high_index'
my $index = 0;
foreach my $ele (@sorted) {
my $h_i = $index + ($inputs_cnt{$ele}-1);
$sorted_vals_index{$ele} = "$index:$h_i";
$index = $h_i+1;
}
my $mid_index = ($index-1)/2;
foreach my $ele (keys %sorted_vals_index) {
my ($l_i, $h_i) = ($sorted_vals_index{$ele} =~ /(.+?):(.+)/);
if (($mid_index >= $l_i) && ($mid_index <= $h_i)) {
return $ele;
}
}
}
Each subroutine takes a hash, where the key is the number & value is the number of occurrences.
So the memory used became very less, since I avoided storing duplicate number-pairs.
0 comments:
Post a Comment