Perl Subroutines

A subroutine is a small user-defined, self-contained subprogram. Like Perl's built-in functions, a subroutine is invoked by name and may have arguments passed to it. A subroutine may return a scalar or list value.

Defining subroutines:

Subroutines are defined using the sub keyword, followed by the subroutine code in curly braces:

sub dictionary_order
{
@ordered = sort @_;
return @ordered;
}

The following is an error because & was used:

sub &dictionary_order # Fatal compile_time error
{
retrun sort @_;
}

Calling subroutines:

Subroutines are called by specifying their name, followed by a list of arguments:

@sorted = dictionary_order ("eat", "at", "Joes");
@sorted = dictionary_order (@unsorted);
@sorted = dictionary_order (@sheep, @goats, "shepherd", $goatherd);
@sorted = &dictionary_order("eat", "at", "Joes");

You can also call a subroutine without parentheses

sub make_sequence # args: (from, to, step_size)
{
# to see the arguments, you can do any of the following.
print "@_";
print $_[0], " ", $_[2], "etc";
%arg = @_;
print $arg{min}, " ", $arg{max}, " ", $arg{step_size};

@list = ();
for ($n = $_[0]; $n < $_[1]; $n+=$_[2])
{
push @list, $n;
}
return @list;
}

# then later...

@stepped_sequence = make_sequence $min, $max, $step_size;

Passing arguments:

Just like any other list, if teh argument has nested lists or arrays, they are "flattened." Therefore, at the start of the third call to dictionary_order above, @_ would contain the contents of the array @sheep, followed by the contents of @goats, the value "shepherd", and finally the scalar value stored in $goatherd. It is possible to pass two or more arrays to a subroutine and keep them "unflattened" by using explicit references.

Refer back to first example in defining subroutines above. The arguments passed to the subroutine are available within its code block via the special @_ array. The built-in function return causes execution of the subroutine to finish immediately and the value specified after the return to be returned as the result. Using a return is optional in a subroutine. If none is specified, the subroutine automatically returns the value of the last statement it actually executed.

Because a subroutine's arguments are passed to it in the special array @_, and because all arrays in Perl are dynamically sized, any subroutine may be passed any number of arguments.

Named arguments:

Suppose we want to implement a subroutine called listdir that provides the functionality of our operating system's directory listing command (i.e., dir or ls). Such a subroutine might take arguments specifying which files to list, what type of files to consider, whether to list hidden files, what details of each file should be reported, whether files and directories should be listed recursively, how many columns to use, and whether the output should be paged or just dumped.

But we certainly don't want to have to specify every one of those nine parameters every time we call listdir:

listdir(undef, undef, 1, 1, undef, undef, undef, 4, 1);

Some programming languages provide a mechanism for naming the arguments passed to a subroutine. Perl supports named arguments in a cunning way. If we pretend that a particular subroutine takes a hash, rather than a list, we can use the => operator to associate a name with each argument. For example:

listdir(cols=>4, page=>1, hidden=>1, sep_dirs=>1);

Inside the subroutine, we simply initialize a hash with the resulting contents of the @_ array. We can access the arguments by name, using each name as the key to an entry in the hash. For example, we can define listdir like so:

sub listdir
{
%arg = @_; # Convert argument list to hash

# Use defaults for missing arguments...

$arg{match} = "*" unless exists $arg{match};
$arg{cols} = 1 unless exists $arg{cols};
# etc.

# Use arguments to control behaviour...
@files = get_files( arg{match} );
push @files, get_hidden_files() if $arg{hidden};
# etc.
}

Since the entries of a hash can be initialized in any convenient order, we no longer need to remember the order of the nine potential arguments, as long as we remember their names. Because hashes are flattened inside lists, if we have several calls that require the same subset of arguments, we can store that subset in a separate hash and reuse it:

%std_listing = (cols=>2, page=>1, sort_by=>"data");

listdir(file=>"*.txt", %std_listing);
listdir(file=>"*.log", %std_listing);
listdir(file=>"*.dat", %std_listing);

We can even override specific elements of the standard set of arguments, by placing an explicit version after the standard set. Then the explicit version will reinitialize (i.e. overwrite) the corresponding entry in the hash:

listdir(file=>"*.exe", %std_listing, sort_by=>"size");

Aliasing of parameters:

Elements of the @_ array are special in that they are not copies of the actual arguments of the function call. Rather they are aliases for those arguments. That means that if values are assigned to $_[0], $_[1], $_[2], etc., each value is actually assigned to the corresponding argument with which the current subroutine was invoked. In other words, its a call-by-reference rather than call-by-value. The following subroutine increments its first argument each time it's called, but keeps the result less than 10 at all times.

sub cyclic_incr
{
$_[0] = ($_[0]+1) % 10;
}

The result would be:

$next_digit = 8;
print $next_digit; # prints 8

cyclic_incr($next_digit);
print $next_digit; # prints9

cyclic_incr($next_digit);
print $next_digit; # prints 0

An unmodifiable value like 7 as opposed to a variable like $next_digit would cause a fatal error. If you don't intend to change the values of the original arguments, it's usually a good idea to explicitly copy the @_ array into a set of variables.

sub next_cyclic
{
($number, $modulus) = @_;
$number = ($number+1) % $modulus;
return $number;
}

The variables $number and $modulus are still global but more visible. For local variables use my keyword.

Calling Context
When a subroutine is called, it's possible to detect whether it was expected to return

* a scalar value
* a list or
* nothing at all

These three possibilities define three contexts in which a subroutine may be called.

listdir(@files); # void context: no return value expected
$listed = listdir(@files); # scalar context: scalar return value expected
@missing = listdir(@files); # list context: list return value expected
($f1, $f2) = listdir(@files); # list context
print( listdir(@files) ); # list context

Wantarray function
There is a built-in function in Perl, which tells the subroutine is expected to return. The function returns

* undef if the current value was not expected to return a value.
* "" if it was expected to return a scalar.
* 1 if it was expected to return a list.

We could use this information to select the appropriate form of return statement (and perhaps optimize for cases where the return value would not be used). For example:

sub listdir
{
# Do file listing, and then:

return @missing_files if wantarray();
return $listed_count if defined(wantarray());
}

If a subroutine is always supposed to return a value, we could issue a warning whenever that return value is ignored:

use Carp;

sub listdir
{
# Do file listing, and then:

return @missing_files if wantarray;
return $listed_count if defined(wantarray);
carp "subroutine &listdir was called in void context";
}

We use Carp::carp subroutine, instead of the built-in warn function, so that the warning reports the location of the call to listdir, instead of the location within listdir at which the error was actually detected.

Determining a subroutine's caller

The Carp module is useful because it reports the location of a subroutine's caller, rather than the location of the subroutine's code.
caller function
Unlike most languages, Perl makes it easy to determine where a subroutine was called. The built-in caller function provides details of the caller. This function works differently in string and list context:
1. String Context
In scalar context caller returns:

1. the package from which the current subroutine was called.
2. the name of the file containing the code that called the current subroutine
3. the line in that file from which the current subroutine was called

2. List Context
In list context, caller returns:

1. the package from which the current subroutine was called.
2. the name of the file containing the code that called the current subroutine
3. the line in that file from which the current subroutine was called
4. the name of the subroutine
5. whether the subroutine was passed arguments
6. the context in which the subroutine was called (the value returned by wantarray)
7. the actual source code that called the subroutine (but only if the call was part of an eval TEXT statement)
8. whether the subroutine was called as part of a require or use statement.

Prototypes

Subroutines can also be declared with a prototype, which is a series of specifiers that tells the compiler to restrict the type and number of arguments with which the subroutine may be invoked. For example, in the subroutine definition

sub insensitive_less_than ($$)
{
return lc($_[0]) lt lc($_[1]);
}

the prototype is ($$) and specifies that the subroutine insensitive_less_than can only be called with two arguments, each of which will be treated as a scalar -- even if it's actually an array. In other words, a $ prototype causes the corresponding argument to be evaluated in a scalar context. That means, for example, that a call like insensitive_less_than(@a, @b) will be treated @a and @b as scalars. The two values passed to insensitive_less_than will be the lengths of @a and @b respectively, not their contents. This kind of introduced subtlety is a good reason to avoid using a prototype, unless you're very confident that you know its full consequences.
Prototypes are only enforced when a subrouting is called using the name(args) syntax. Prototypes are not enforced when a subroutine is called with a leading & or through a subroutine reference. They are also ignored when an object method is called.