1. #!/usr/bin/perl
2. blank line
3. # Blosxom
4. # Author: Rael Dornfest rael@oreilly.com <mailto:rael@oreilly.com>
5. # Version: 2.0.2
6. # Home/Docs/Licensing: http://www.blosxom.com/
7. # Development/Downloads: http://sourceforge.net/projects/blosxom
8. blank line
9. package blosxom;
blosxom package statement
10. blank line
11. # --- Configurable variables -----
12. blank line
13. # What's this blog's title?
14. $blog_title = "My Weblog";
$blog_title initialization
Defines the weblog title
15. blank line
16. # What's this blog's description (for outgoing RSS feed)?
17. $blog_description = "Yet another Blosxom weblog.";
$blog_description initialization
Defines the weblog description
18. blank line
19. # What's this blog's primary language (for outgoing RSS feed)?
20. $blog_language = "en";
$blog_language initialization
Defines the weblog language
The comment in the source tells us that this is used for the outgoing RSS feed.
21. blank line
22. # Where are this blog's entries kept?
23. $datadir = "/Library/WebServer/Documents/blosxom";
$datadir initialization
Specifies weblog data directory
This is the root of the directory where you will keep your posts. The value should be the complete path from the filesystem root to and including the name of the data directory itself.
The path should begin with a leading slash '/' and should not include a trailing slash '/', though the script will strip a trailing forward-slash if it finds one.
You will need to create this directory yourself and set up permissions properly. The script will not create it for you.
24. blank line
25. # What's my preferred base URL for this blog (leave blank for automatic)?
26. $url = "";
$url initialization
Defines the base URL for the weblog
This is the address a visitor would type into the address bar of her browser to get to the blosxom.cgi script itself. Without any redirection, this value will end with the name of the script file.
eg:
http://sample.net/cgi-bin/blosxom.cgi
27. blank line
28. # Should I stick only to the datadir for items or travel down the
29. # directory hierarchy looking for items? If so, to what depth?
30. # 0 = infinite depth (aka grab everything), 1 = datadir only, n = n levels down
31. $depth = 0;
$depth initialization
Defines the depth that blosxom will plumb for any directory it's given to find posts in subdirectories.
If the value is 1, blosxom will look for posts only in the requested directory.
So, if the request is for the weblog homepage, then the script will only look for posts in $datadir (specified above). Posts in subfolders will be ignored.
This same value controls the depth blosxom will explore for requests that do not start at the root.
If a visitor requests:
http://sample.net/cgi-bin/blosxom.cgi/Technology/Computer/
and the $depth has a value of 1, then blosxom will look for posts only in the '../Computer/' directory.
A value of 2 will also include posts at /Computer/Apple/, assuming the subdirectory 'Apple' exists. Also /Computer/Dell/, and /Computer/Lenovo/ if those subdirectories exist. A value of 1 means to include all directories at one level, not a single directory.
32. blank line
33. # How many entries should I show on the home page?
34. $num_entries = 40;
$num_entries initialization
Defines the max number of posts displayed on any single page, with the exception of date-based archive requests which always display all posts that match the requested date.
Though the comment in the source suggests that this value affects only the homepage, it in fact applies to all page requests (again, with the exception of date-based archives).
eg:
http://sample.net/cgi-bin/blosxom.cgi
will display a maximum of $num_entries posts.
http://sample.net/cgi-bin/blosxom.cgi/Technology/Computer/
Also displays a max of $num_entries posts, this time starting at the subdirectory '..Computer/'
On the other hand
http://sample.net/cgi-bin/blosxom.cgi/2006/
Always displays all posts created in 2006, regardless of the value of $num_entries.
There is no way to request posts beyond $num_entries using only blosxom itself. You will find a plugin to handle this if it's something you need.
If your weblog has 1000 total posts and $num_entries is set to 40, only the 40 most recent posts are directly accessible from the homepage.
35. blank line
36. # What file extension signifies a blosxom entry?
37. $file_extension = "txt";
$file_extension initialization
Specifies the extension used for all files that blosxom should treat as posts.
$file_extension is the most basic mechanism for controlling inclusion/exclusion of files as posts.
The default, 'txt', seems like a good choice because:
html is another option, though it might prove to be somewhat confusing because:
Something like 'blosxom' or 'bsxm' can work too, but may be unwieldy depending on your platform or editor.
38. blank line
39. # What is the default flavour?
40. $default_flavour = "html";
$default_flavour initialization
Specifies which flavour (template) blosxom should use if none is specified explicitly by the browser.
For example when requesting
the homepage
http://sample.net/cgi-bin/blosxom.cgi,
a category
http://sample.net/cgi-bin/blosxom.cgi/Technology/Computer/,
or date-based archive
http://sample.net/cgi-bin/blosxom.cgi/2006/11/04/
Though you're free to use any flavour you like as the default, I highly recommend either 'html' or 'htm'.
Why?
Because this is a very typical extension in use on the web.
More important than what extension you choose is always using the same extension as your default flavour.
You may have several flavours available to blosxom at any given time and occasionally you may want to change the default.
For example, maybe you've decided to change flavours with the seasons.
Rather than changing the default flavour here from 'fall' to 'winter', I recommend that you rename the flavour corresponding to the current season to 'html', keeping the default name.
i.e.
In winter change the extension on your winter flavour components to html. Come spring, restore the .winter extension and rename your spring flavour components.
Why is this important?
When visitors access your site and do not specify a flavour, blosxom appends the default. If the visitor bookmarks the site, the link will include the default extension. When accessing your site in the future via the bookmark, the same visitor will always get that same flavour, even if you have changed the default.
Why?
Because the bookmark will include an extension, so the request will not resort to using the new default.
Moreover, if you ever remove a flavour that was a default at any time, it's possible that the visitor will be greeted with an error in the future, when blosxom is unable to find the flavour specified in the bookmark.
Finally, everything I've said so far is just as true for search engines as it is for people.
For example, if Google indexes a page at your site the address will include the default flavour. It follows that search results that Google returns in response to queries will include that extension.
Always renaming your intended default, so that the value of this variable does not change, is a great way to avoid these sorts of problems.
The discussion of persistent links and consistent naming is more involved than this. Ideally addresses should not include any extension at all. (It is possible to get this behavior with blosxom. In fact there is already a plugin that does this.)
For now, and considering only the blosxom script itself, the advice is:
avoid changing the value of $default_flavour - after the initial configuration of course.
41. blank line
42. # Should I show entries from the future (i.e. dated after now)?
43. $show_future_entries = 0;
$show_future_entries initialization
The value of this variable should be set to one of the two numeric values: 1 (one) or 0 (zero).
Think of these values as true and false respectively. This is precisely what they mean.
It is possible to postdate entries so that they appear to have been written at some future date or time.
One way to do this (there are others) is by using the Unix touch command.
See...
$ man touch
...for more info.
It is possible to use blosxom to accomplish the same thing with the help of a plugin or two.
Here you are instructing blosxom to either:
How might you use this feature?
You might leave this value set to 0 (zero) and postdate an entry to have it automatically show up on the specified date and time, so that you do not need to remember to post it yourself in the future.
44. blank line
45. # --- Plugins (Optional) -----
46. blank line
47. # Where are my plugins kept?
48. $plugin_dir = "/Library/WebServer/Data/Blosxom/plugins";
$plugin_dir initialization
Specifies the location of the weblog plugin directory.
This is the location where you will place any plugins you would like to use with blosxom.
There are quite a few plugins available.
Some of them are fairly specialized (i.e. probably not relevant to you and your weblog) and others are all but necessary e.g. you can say that a navigable calendar is a defining feature of all weblogs, and if so then you may consider the calendar plugin necessary.
In any case, all plugins live in this directory.
You should not (in fact you cannot) place active plugins in subdirectories of $plugin_dir, they must all reside in this folder directly.
These files are not intended to be directly viewed by visitors to your site. So, it should not be possible for a visitor to use their browser to navigate to the folder containing the plugins.
Assuming permissions on your server are set up properly, it is very simple to accomplish this by making sure the plugins directory is not within the webserver's document root.
The value should be the complete path from the filesystem root to, and including, the name of the data directory itself.
The path should begin with a leading slash '/' and should not include a trailing slash '/', though the script will strip a trailing forward-slash if it finds one.
You will need to create this directory yourself and set up permissions properly. The script will not create it for you.
49. blank line
50. # Where should my modules keep their state information?
51. $plugin_state_dir = "$plugin_dir/state";
$plugin_state_dir initialization
Specifies the location of the plugin state directory.
It will often be the case that plugins will need to save information related to their function during their operation.
All of this information should be saved to the state directory specified here. Like the plugins directory, visitors should not be able to access$plugin_state_dir directly.
Though you can place this directory anywhere that blosxom will have access to it, making ..state/ a subdirectory of the plugins directory makes as much sense as anything else. It is the default so picking this location requires absolutely no work on your part.
One of the advantages of this arrangement is that you can specify the path here in terms of $plugin_dir.
eg:
$plugin_state_dir = "$plugin_dir/state";
You should expect that plugins will be fairly tidy when it comes to the use of the state directory.
You will need to create this directory yourself and set up permissions properly. The script will not create it for you.
52. blank line
53. # --- Static Rendering -----
54. blank line
55. # Where are this blog's static files to be created?
56. $static_dir = "/Library/WebServer/Documents/blog";
$static_dir initialization
Specifies the location of the directory where blosxom generates the complete site when run in static mode.
In static mode blosxom generates a complete website, running through all posts and generating every page at once. At the very least running blosxom in static mode generates:
See line 61 for more info.
The value should be the complete path from the filesystem root to, and including, the name of the data directory itself.
The path should begin with a leading slash '/' and should not include a trailing slash '/', though the script will strip a trailing forward slash if it finds one.
You will need to create this directory yourself and set up permissions properly. The script will not create it for you.
A full discussion of static mode here is not necessary but I will talk about it briefly.
Normally any request sent to your weblog runs the blosxom.cgi script. The script takes a nonzero amount of time to execute.
This is in addition to the time in takes for:
For large sites (with lots of posts) it may take a considerable amount of time for the script to work through every post on the site.
aside: blosxom consults the filesystem for info about all posts in every subdirectory of $datadir on every request. As far as computing operations go, accessing the filesystem is slow.
Also, a large or particularly busy site may create a lot of activity for the host computer. This can slow down a site from a visitor's perspective and tax system resources from the perspective of a hosting provider.
Futhermore, there are potential security implications of running any code via a browser and these concerns are complicated by blosxom's plugin architecture. Even if blosxom.cgi itself is safe some poorly written or ill-conceived plugin may expose your site to risk.
Finally, it's possible that the configuration of your webserver, or some other restriction imposed by your hosting provider, precludes the use of CGI scripts like blosxom outright.
For these reasons, among others, you may prefer to run blosxom offline, forcing it to generate all of the content on your site at once.
The resulting pages can then be moved to your webserver where they can be served as static content, without the risk or overhead required for the script to operate on every request.
As described static mode may sound appealing. I want to make a point of encouraging you to use blosxom dynamically if appropriate.
Why?
There are simply some things you can do running dynamically that cannot be done statically and you'll find many plugins that will only work in dynamic mode.
Beyond this though, static mode is a bit of a pain to deal with when it isn't absolutely necessary and, to some extent, it doesn't really speak to blosxom's strengths.
Still, static rendering is a great option when necessary.
Some combination of static and dynamic mode operation may be a perfect fit for what you want to do. However, I feel safe in saying that such a mixed configuration is the most difficult type of setup to maintain.
It should be the goal of the project to improve the efficiency of the script and its plugins so that running dynamically is a practical arrangement for the vast majority of installations (of reasonable size/scope of course).
57. blank line
58. # What's my administrative password (you must set this for static rendering)?
59. $static_password = "";
$static_password initialization
Defines the password which must be included as a parameter when calling the script to run in static mode.
A password must be defined here and then used to run blosxom in static mode.
The password serves two different purposes:
First, it is a security measure.
In theory only someone who knows the password you specify here can run blosxom in static mode.
It also acts as a means to enable or disable static mode operation.
Setting the value to "" (the default) disables static mode operation. The presence of the password as a commandline parameter overrides the default and indicates to blosxom that it should run in static mode.
The password you choose should be a good one. There are many different ideas about what makes a password 'good'.
I'll recommend the following:
60. blank line
61. # What flavours should I generate statically?
62. @static_flavours = qw/html rss/;
@static_flavours initialization
Specifies which flavours blosxom should attempt to generate statically.
You may have designed many flavours.
When in static mode blosxom must essentially generate a complete copy of the site separately for each flavour. Also, you may have flavours that depend on dynamic mode operation.
For these reasons, among others, it probably makes sense for you to generate only a small number of flavours statically.
Here you can specify which flavours blosxom will attempt to generate.
You may want to consider sticking with the default and limiting static mode output to html and rss (or whatever flavour you use for syndication feeds).
You can of course add flavours to, remove flavours from, or edit the names of the flavours that appear in this list. Simply separate the flavour names by whitespace between the '/' characters.
63. blank line
64. # Should I statically generate individual entries?
65. # 0 = no, 1 = yes
66. $static_entries = 0;
$static_entries initialization
The value of this variable should be set to one of the two numeric values: 1 (one) or 0 (zero).
Think of these values as true and false respectively. This is precisely what they mean.
When running in static mode blosxom always generates:
All of these pages are generated for every static flavour you've specified.
This is already potentially a large number of pages.
Additionally, blosxom can generate a page for each individual post on your site. Realize that this has the potential to create very many pages depending on the number of posts on your site, the depth of your categorization scheme, the number of static flavours you specify in @static_flavours, and possible other factors.
The value of this variable indicates your preference to the script.
Setting this value to '1' does mean that the process of generating the site in static mode will take longer, and your site will be larger in terms of the amount of drive space it occupies, but will not increase the amount of work the webserver must do to serve requests for your site. The amount of time to to serve any single static page is not strongly related to the total number of pages.
67. blank line
68. # --------------------------------
69. blank line
70. use vars qw! $version $blog_title $blog_description $blog_language $datadir $url %template $template $depth $num_entries $file_extension $default_flavour $static_or_dynamic $plugin_dir $plugin_state_dir @plugins %plugins $static_dir $static_password @static_flavours $static_entries $path_info $path_info_yr $path_info_mo $path_info_da $path_info_mo_num $flavour $static_or_dynamic %month2num @num2month $interpolate $entries $output $header $show_future_entries %files %indexes %others !;
'use vars' syntax is deprecated as of 5.6 in favor of 'our' declarations.
Essentially either declare variables as package globals when the 'strict' pragma is in effect. Named variables may be referred to within the same file and package with their unqualified names; and in different files/packages with their fully qualified names.
By using this declaration plugins have access to any of these declared variables.
71. blank line
72. use strict;
Perl pragma that introduces some basic programming restrictions to help guide the developer toward responsible and readable coding.
See..
$ man strict
..for more info.
73. use FileHandle;
Lines 73 - 77 include modules and classes that the script requires during execution.
See the relevant documentation for more info about each of these.
74. use File::Find;
Lines 73 - 77 include modules and classes that the script uses during execution.
See the relevant documentation for more info about each of these.
75. use File::stat;
Lines 73 - 77 include modules and classes that the script uses during execution.
See the relevant documentation for more info about each of these.
76. use Time::localtime;
Lines 73 - 77 include modules and classes that the script uses during execution.
See the relevant documentation for more info about each of these.
77. use CGI qw/:standard :netscape/;
Lines 73 - 77 include modules and classes that the script uses during execution.
See the relevant documentation for more info about each of these.
78. blank line
79. $version = "2.0.2";
$version initialization
80. blank line
81. my $fh = new FileHandle;
Declares a new, uninitialized FileHandle, $fh.
We will be using this filehandle to read in our posts (and write pages in static mode).
82. blank line
83. %month2num = (nil=>'00', Jan=>'01', Feb=>'02', Mar=>'03', Apr=>'04', May=>'05', Jun=>'06', Jul=>'07', Aug=>'08', Sep=>'09', Oct=>'10', Nov=>'11', Dec=>'12');
%month2num initialization
First use of %month2num defines the variable as a hash of key/value pairs where keys are short month names like 'Jan', and values are two digit strings e.g. '01' to '12'
Note that nil=>'00' is only a placeholder. It's included to make possible the next line's assignment of %month2num's keys to @num2month so that $num2month[1] is 'Jan'. The placeholder is necessary because there is no month 0.
Two digit values are specified as strings. This ensures that the two digit format is reliable. In other words blosxom prefers to always see '01', and never 1.
84. @num2month = sort { $month2num{$a} <=> $month2num{$b} } keys %month2num;
@num2month initialization
We initialize the array to contain a list of three character strings that represent months by name at corresponding index positions.
Looking at the expressions that contribute to the statement from right to left:
First we take the keys from the %month2num hash.
The result is a list of values 'nil', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'
Next, we sort the list of keys by their value in the hash in ascending numerical order.
We're sorting by the hash values 00, 01, 02, 03, 04, 05,...,12. The strings are automatically converted to corresponding numerical values for the purposes of the comparison.
Finally, we store the sorted list of keys in the array @num2month.
$num2month[0] is 'nil', $num2month[1] is 'Jan',..., $num2month[12] is 'Dec'.
85. blank line
86. # Use the stated preferred URL or figure it out automatically
87. $url ||= url(-path_info => 1);
This is an example of a statement (one of many in the current code) that uses a partial evaluation operator, or '||' in this case, as a control structure.
$url can be set manually in the configurable variables section of the code. If it is set, then $url is true and the rest of the line is not evaluated.
If $url is not defined manually, then this expression evaluates to false and the rest of the statement is considered.
So, if you do not specify a $url in the user configurable varibles section above (line 26), then url() is called and the result is stored in the variable, (only if $url was initially empty).
url() is part of the CGI class. From the CGI module's manpage:
url() returns the script's URL in a variety of formats. Called without any arguments, it returns the full form of the URL, including host name and port number.
-path (-path_info) Append the additional path information to the URL. This can be combined with -full, -absolute or -relative. -path_info is provided as a synonym.
We need to look ahead to the comments that run lines 90 - 93 to understand why we're using the -path_info argument.
As it concerns this line those comments tell us that in some cases url will always append path_info to the URL. We're not interested in the additional path_info, but we are interested in having a consistent value in $url. Following the 'go with the flow' philosophy so that we can have a dependable value, we include the argument to insure that we will have the path_info always, not sometimes. Always is easier to deal with than sometimes.
Because we are not interested in the additional path info, we will strip it off soon, lines 94 - 96.
In any case, at this point $url should contain at least the url to the blosxom script url.
e.g.
http://example.net/cgi-bin/blosxom.cgi
88. $url =~ s/^included:/http:/ if $ENV{SERVER_PROTOCOL} eq 'INCLUDED';
At this point we know $url contains a value. That the value should be the full url including protocol, host, and port number. Possibly with additional path info if a path was specified in the request.
This line uses Perl's substitution operator, s/// to 'fix' the string at $url in a specific case.
If the value of $url begins with the substring 'included:', then we substitute 'http:' within the string.
'^' in the pattern matches the start of the string.
If the value of $url does not match the specified pattern, it is left unaltered.
i.e'
included://some_host_address/cgi-bin/blosxom.cgi
becomes
http://some_host_address/cgi-bin/blosxom.cgi
89. blank line
90. # NOTE: Since v3.12, it looks as if CGI.pm misbehaves for SSIs and
91. # always appends path_info to the url. To fix this, we always
92. # request an url with path_info, and always remove it from the end of the
93. # string.
94. my $pi_len = length $ENV{PATH_INFO};
We begin the work of stripping the additional path info from $url (if present) so that we end up with just the base url to the blosxom executable in $url, which is the value we want in the variable.
We get the length of the string that represents the additional path info.
%ENV is a hash that is automatically created and populated with interesting bits of information related to the environment in which the Perl script is running. PATH_INFO is one key of many in this hash of key/value pairs. The value of $ENV{PATH_INFO} is the additional path info from the URL.
e.g.
If a browser requests
http://sample.net/cgi-bin/blosxom.cgi/Technology/Computer/Apple/some_post_aboutApple
then the value of PATH_INFO will be
'/Technology/Computer/Apple/some_post_aboutApple'
$ENV{PATH_INFO} returns only the path info portion of the URL as a string.
length
takes the string and returns its length.
We store that numeric length value in the variable,
$pi_len.
95. my $might_be_pi = substr($url, -$pi_len);
Continuing our work correcting for the inclusion of path info as part of the value at $url.
substr is a Perl function that returns a substring when given a larger string.
substr($url, -$pi_len)
$url is the larger string and we are asking for some portion of it. Specifically $pi_len number of characters from the end of the string.
The '-' in -$pi_len indicates that we want perl to start counting and from the end of the string rather than the beginning.
The return value of substr is the substring itself, which we store in $might_be_pi.
Continuing with the example we started at line 94
If the browser requests
http://sample.net/cgi-bin/blosxom.cgi/Technology/Computer/Apple/some_post_aboutApple
then the value of PATH_INFO will be
'/Technology/Computer/Apple/some_post_aboutApple'
and $pi_len will be 47.
substr($url, -$pi_len) will return, and store in $might_be_pi, the last 47 characters of $url.
In our example, $url is
http://sample.net/cgi-bin/blosxom.cgi/Technology/Computer/Apple/some_post_aboutApple
and the last 47 characters are the path portion of the URL.
So after this assignment, $might_be_pi contains the path portion of the URL, which is, in this case, exactly the value contained in $ENV{PATH_INFO}.
96. substr($url, -length $ENV{PATH_INFO}) = '' if $might_be_pi eq $ENV{PATH_INFO};
Continuing our work correcting for the inclusion of path info as part of the value at $url.
This line is an example of a Perl expression modifier, which is simply a more compact way of writing conditional code, an if block in this case.
First we evaluate the condition after the if, which is
$might_be_pi eq $ENV{PATH_INFO}
and if this expression is true then we consider the rest of the statement,
substr($url, -length $ENV{PATH_INFO}) = ''
If the condition is not true, then we skip the rest of the statement.
Because this is the first time we're seeing an expression modifier, let's look at how this statement would look rewritten as a typical if block
if($might_be_pi eq $ENV{PATH_INFO}) {
substr($url, -length $ENV{PATH_INFO}) = '';
}
The value of $might_be_pi is compared to the value at $ENV{PATH_INFO}. If the two values are identical, then the condition is satisfied (and we are satisfied that there is path info included in $url). Having established this, we continue with the rest of the statement.
substr($url, -length $ENV{PATH_INFO}) = '';
There is a lot going on here. Taken together as an idiom, it's easy to talk about what this expression is doing. Teasing it apart considering how the expression is constructed isn't anymore difficult but does take more time and typing.
We've seen substr before, at line 95, and this use is very similar. Also we've seen length at line 94. Here we combine both in one expression.
$ENV{PATH_INFO} is the path info portion of the URL, as already discussed.
Passing this value to length returns the number of characters in the path, again this has already bee talked about.
substr targets a substring of some larger string, in this case $url.
How large a substring is determined by the second argument to substr, here it is length $ENV{PATH_INFO} ('-' as discussed means that we are interested in the substring at the end of $url rather than the beginning).
Usually substr returns the specified substring. When combined with an assignment, as it is here, the effect is to replace the substr with the assigned value in the larger string.
So this expression selects the path portion of the string in $url and replaces it with the empty string, ''. In other words, the expression strips the path info from $url.
97. blank line
98. $url =~ s!/$!!;
Another use of Perl's substitution operator.
Here the pattern we're looking for is:
/$
a forward slash followed immediately by the end of the string value.
If we find a trailing forward slash, we replace it with nothing.
In other words, this line drops a single trailing forward slash from the value in $url.
99. blank line
100. # Drop ending any / from dir settings
101. $datadir =~ s!/$!!; $plugin_dir =~ s!/$!!; $static_dir =~ s!/$!!;
Drops any single trailing '/' from any of $datadir, $plugin_dir, $static_dir.
This line simply gives the code flexibility to deal with the possibility of users including a trailing '/' in these paths, though blosxom expects that the values do not include the character.
Note that this line combines three sort statements on one line.
The lines could be rewritten as
$datadir =~ s!/$!!;
$plugin_dir =~ s!/$!!;
$static_dir =~ s!/$!!;
102. blank line
103. # Fix depth to take into account datadir's path
104. $depth and $depth += ($datadir =~ tr[/][]) - 1;
First note the use of the partial evaluation operator 'and'
If $depth is zero then the rest of the statement is not evaluated.
$depth will be zero if the user manually set it to zero in the user configurable variables section of the script, which indicates that she wants infinite depth.
Given this meaning (i.e. 0 indicates infinite depth), it makes sense that we wouldn't want to add to the $depth if $depth is 0 (infinity + 1 doesn't make much sense).
If $depth is not zero, we are modifying the value at $depth.
tr/// is Perl's transliteration operator.
It can be used to replace characters in the first list with corresponding characters from the second.
The return value is the number of characters replaced or deleted.
When used as it is here, with an empty second list, it's used to count occurrences of characters in the first list.
So here we're counting forward slashes, '/', in $datadir
This statement adds to $depth a correction for the length of the path to the start of the data directory.
e.g.
$datadir = "/Library/WebServer/Documents/blosxom";
and
$depth = 2;
# indicating that we want blosxom
# to consider posts in the top two levels
# of the weblog hierarchy
This line results in setting $depth to 5 as follows
+ 4 - 1 = 5.
This seems to be correct because the root of the weblog is at level 4 (relative to the filesystem root, '/'), and we want to consider the top 2 levels of the weblog hierarchy, i.e. levels 4 and 5.
Note: For the curious (or the suspicious), we add a similar correction anytime we request a path that may be further down the data directory hierarchy. So the value we use for $depth is always relative to the correct starting point.
105. blank line
106. # Global variable to be used in head/foot.{flavour} templates
107. $path_info = '';
$path_info initialization
We're simply initializing $path_info to the empty string, ''.
108. blank line
109. $static_or_dynamic = (!$ENV{GATEWAY_INTERFACE} and param('-password') and $static_password and param('-password') eq $static_password) ? 'static' : 'dynamic';
$static_or_dynamic initialization
The statement sets the variable to one of two values, either 'static' or 'dynamic'.
This choice made with the help of Perl's ternary operator ?:
The expression before the ? is evaluated.
If it evaluates as true (non-zero) then the expresion to the left of the colon, ':', is evaluated.
Otherwise the expression to the right of ':' is evaluated.
In this line the test expression is:
(!$ENV{GATEWAY_INTERFACE} and param('-password') and
$static_password and param('-password') eq $static_password)
The expression checks:
If all of these things are true then $static_or_dynamic will be assigned the value 'static', which is the expression to the left of the the colon.
Otherwise, $static_or_dynamic will be set to the value 'dynamic', the expression to the right of the ':'.
At this point in the script we can check $static_or_dynamic to know in which of the two modes we're operating.
110. $static_or_dynamic eq 'dynamic' and param(-name=>'-quiet', -value=>1);
This is an another use of partial evaluation operators as control structures, in this case 'and'.
The first part of the statement
$static_or_dynamic eq 'dynamic'
tests the value of $static_or_dynamic, which we just initialized at line 109.
If the value is 'dynamic', meaning that blosxom has determined that it is running in dynamic mode, then the second part of the statement is evaluated
param(-name=>'-quiet', -value=>1)
which assigns the -quiet parameter a value 1.
This might seem to make sense, a value of 1 indicating that the script should suppress unwanted output when running dynamically, but the parameter doesn't seem to be used anywhere.
This parameter is useful in the case of static mode operation.
From the documentation we have this description:
to have Blosxom's static rendering run silently -- perhaps you're running it automatically at regular intervals and you don't want all that output popping up on your screen or being mailed to you -- add -quiet=1 like so:
% perl blosxom.cgi -password='whateveryourpassword' -quiet=1
Setting -quiet to 1 suppresses output.
111. blank line
112. # Path Info Magic
113. # Take a gander at HTTP's PATH_INFO for optional blog name, archive yr/mo/day
114. my @path_info = split m{/}, path_info() || param('path');
Declaration and initialization of my @path_info
This is an another use of a partial evaluation operator as control structure, in this case '||' (pronounced 'or').
This is the first appearance of the variable my @path_info.
split operates on a single string.
It works by spliting the string it's given into a list of values at the specified (matched) character.
The specified character in this case is forward slash, '/'.
What string are we asking split to work on?
First we run path_info(), another function from the CGI module.
from the documentation for CGI:
path_info() Returns additional path information from the script URL. E.G. fetching /cgi-bin/your_script/additional/stuff will result in $query->path_info() returning
"/additional/stuff".
So $path_info is everything in the address after the script itself.
For blosxom this part of the address corresponds to the data directory or date-based archive hierarchies.
e.g.
For the request
http://sample.net/cgi-bin/blosxom.cgi/Technology/AppleInc/Hardware/Macintosh
path_info() returns
'/Technology/AppleInc/Hardware/Macintosh'
which we split on '/' so that the array @path_info contains the list
'', 'Technology', 'AppleInc', 'Hardware', 'Macintosh'
Note that the initial list item is the empty string, ''.
This value is introduced when splitting on the leading '/', i.e. ^/Technology.
The next line takes care of removing this unwanted element.
The use of '||' means that only if path_info() returns a false value (i.e. no path was specified) will the second part of the statement be evaluated, which passes the value of the path parameter to the split operator.
If both path_info() and param('path') are empty, then @path_info is empty. This will happen whenever the a browser makes a request for the root of the weblog.
e.g.
a request for
http://sample.net/cgi-bin/blosxom.cgi
115. shift @path_info;
shift operates on the beginning (lowest index value) of any list, such as the list contained @path_info in this case.
shift returns the value at the first index position, removing it from the array.
Here, we're doing nothing with return value, so the effect is simply to remove the first element from the array.
Continuing the discussion started at line 109 above, if the array contains the list
'', 'Technology', 'AppleInc', 'Hardware', 'Macintosh'
We shift off the empty string value, leaving only the elements we want
'Technology', 'AppleInc', 'Hardware', 'Macintosh'
116. blank line
117. while ($path_info[0] and $path_info[0] =~ /^[a-zA-Z].*$/ and $path_info[0] !~ /(.*)\.(.*)/) { $path_info .= '/' . shift @path_info; }
While loop, condition and body
This is a while loop operating on the newly created @path_info array.
Yet again we have a logical partial evaluation operator involved in the control of the script's execution.
The first part of the condition,
$path_info[0]
simply checks that $path_info[0] (the first element of @path_info) does in fact contain a value. Remember that @path_info may be empty.
If it evaluates as false, because @path_info is empty, then we out of the loop - or skip the loop entirely if $path_info[0] is empty the first time the condition is tested.
On the other hand, if $path_info[0] is true, the next part of the condition is evaluated.
$path_info[0] =~ /^[a-zA-Z].*$/
Here we're attempting to match the vaue at $path_info[0] to the given pattern.
The pattern is specifies:
So the pattern matches if the value at $path_info[0] starts with a letter.
If the pattern matches, we evaluate the last part of the condition.
$path_info[0] !~ /(.*)\.(.*)/
This is another pattern match, but here we succeed if we do not match the pattern provided (compare !~ and =~)
The pattern specifies:
The pattern matches any string that contains a dot, which when working with the elements of a path, as we are here, might loosely describe a filename.
This might lead you to suspect that category names in blosxom should not contain dots, '.', even though your operating system may allow it.
You would be right.
In summary:
The body of the loop runs if:
Note: From the documentation we know that directory names, used to categorize entries, must not start with a digit.
Now we come to the body of the while loop
$path_info .= '/' . shift @path_info;
This statement is doing two things.
Working from left to right:
e.g.
If @path_info contains the list 'Technology', 'AppleInc', 'Hardware', 'Macintosh'
Then we remove 'Technology' and prepend a leading forward slash:
'Technology' becomes '/Technology'
@path_info is left with the list 'AppleInc', 'Hardware', 'Macintosh'.
Next, we take the string we just created append it to the end of the current value of $path_info and assign that new string back to $path_info with the single operator (.=).
The first time the loop is run $path_info is empty (we initialize it to the empty string in line 102).
What does the loop accomplish?
We build up in the string $path_info all of the path information given to blosxom -everything in the address following the script itself not including the name of the file - if present.
Note that because we shift the array every time we repeat the loop, the value of $path_info[0] changes each time through. Eventually we'll shift off all elements of the array and drop out of the loop. But keep in mind that we will drop out of the loop if any of the subexpressions of the while condition are false. Those conditions again are:
Note the entire while loop, including the loop body is limited to this single line.
118. blank line
119. 119. # Flavour specified by ?flav={flav} or index.{flav}
120. 120. $flavour = '';
$flavour initialization
$flavour is a package global
121. blank line
122. if ( $path_info[$#path_info] =~ /(.+)\.(.+)$/ ) {
Start of if block and condition
After the while loop, line 117, @path_info array may still contain elements.
It will contain a list of all elements following, and including, the first beginning with a digit, and possibly the filename of a specific post or of the name of an index page (e.g. index.html) if either was included in the request.
This conditional attempts to match the last element of the @path_info array,
specified by $path_info[$#path_info],
to the pattern (.+).(.+)$
will match something that looks like a filename, i.e. literally any string starting with at least one of any character (other than \n), followed by a dot, followed by at least one of any character (other than \n).
If the last element remaining in the array is a filename (if indeed there is a last element at all - remember the array could be empty), then we execute the body of the conditional.
123. $flavour = $2;
Note the use of the parentheses in the pattern (.+).(.+)$
The pattern takes advantage of Perl's memory variables.
Each set of parens in the pattern corresponds to a variable containing the portion of the string that matched the pattern inside the parentheses. The variables are named $1, $2, $3... etc, one per set of parens in the pattern, in order from left to right. Note that nesting parens does affect the names assigned to these variables. In this case it's simple,
This line assigns the value of $2 to the package global $flavour, which should make sense. If there is a filename provided, and it includes an extension, then the extension specifies the requested flavour.
124. $1 ne 'index' and $path_info .= "/$1.$2";
If we see something in the form of a filename (name.extension), then name may be a filename or 'index', which is a request for a listing of all entries in the given category.
Here we use another partial evaluation operator as a flow control statement as follows:
If $1 (the name portion of name.extension) does not equal (ne) 'index', then $1.$2 looks like a filename and not a request for an index page.
In this case the line appends the filename to the end of $path_info, after prepending a forward slash '/' as a delimiter.
So a legitimate filename is included as part of $path_info but not an index request.
125. pop @path_info;
Remember that shift operates at the beginning of an array (lowest index values), pop works at the end of the array.
Keep in mind that we are inside of a conditional block and only executing this line if the the last element of @path_info is either a filename or index request.
We pop and discard this value. Before now we've used the value to pick up the requested flavour (if present) and append the requested filename (if present) to $path_info.
126. end of if
start of else clause
127. $flavour = param('flav') || $default_flavour;
The else clause executes only in the case that the if does not.
The if body will not execute unless the last element of @path_info matches the pattern 'name.extension'.
If that match fails, we won't have assigned to $flavour before this point (something that happens inside the if block).
This line first looks for the provided parameter 'flav' passed to the browser (i.e. ?flav='html'), which is an alternative (deprecated!) method of specifying the flavour, before resorting to the $default_flavour, specified in the configurable variables section of the blosxom.cgi itself.
At this point we know that $flavour has a value, even if it is the default.
128. }
end of else clause started at line 126
129. blank line
130. # Strip spurious slashes
131. $path_info =~ s!(^/*)|(/*$)!!g;
Using Perl's substitution operator this line matches and then discards any/all slashes at the beginning and at the end of the string in $path_info.
s!(^/*)|(/*$)!!g
^/* - matches any number of '/' immediately following the start of the string.
| - alternation. This character in the pattern instructs Perl's regex engine to match on either of the two subpatterns here.
/*$ - matches any number of '/' immediately preceding the end of the string.
/g - this, the global modifier, tells Perl to continue with all possible substitutions rather than stopping with the first match.
132. blank line
133. # Date fiddling
134. ($path_info_yr,$path_info_mo,$path_info_da) = @path_info;
Remember that we recently popped off the filename or index request (if present).
Previously we shifted off any element @path_info before the first value starting with a digit.
The only (valid) elements remaining must be part of a request to blosxom's date-based archived scheme.
e.g.
If the browser request was:
http://sample.net/cgi-bin/blosxom.cgi/2006/12/31
then @path_info at this point contains the list "2006", "12", "31".
Note that it is not necessary for a browser to specify all of year, month, and day in a request, but it's also not valid to skip date values, e.g specifying year and day but not month.
There will be as few as 0 elements remaining in @path_info and as many as three.
All elements remaining will be in the order of: year, month, day
This line assigns each of the (possibly) remaining elements to corresponding variables.
If @path_info does not contain one of more of these values, the corresponding variables will be assigned the value undef.
We can check for this value later in the code.
135. $path_info_mo_num = $path_info_mo ? ( $path_info_mo =~ /\d{2}/ ? $path_info_mo : ($month2num{ucfirst(lc $path_info_mo)} || undef) ) : undef;
The next line may appear at first to be a bit confusing. This is the first time we see a nested use of the ternary operator.
Let's take it one step at a time working from left to right.
$path_info_mo_num =
Tells us we'll be assigning something to the package global $path_info_mo_num.
$path_info_mo
First we evaluate the variable $path_info_mo by itself. This evaluation will return true if the variable was assigned a value in line 134, and it will return false if the value of $path_info_mo is undef.
If true we evaluate the expression to the left of the colon, ':', and if false we evaluate the expression to the right.
Taking the false case first
Be careful not to get confused by the nested use of the ?: operator.
The ':' that pairs with the first '?' is all the way at the end of the statement.
The value to the right of ':' is simply 'undef'.
So if $path_info_mo is undef, meaning that we do not have a month as part of the browser request, then $path_info_mo_num is also assigned the value of 'undef' and we're finished with this line.
If $path_info_mo evaluates to true then we consider the expression
( $path_info_mo =~ /\d{2}/ ? $path_info_mo : ($month2num{ucfirst(lc $path_info_mo)} || undef) )
First we evaluate
$path_info_mo =~ /\d{2}/
We know that $path_info_mo has some value at this point and here we try to match that value against the pattern
\d{2}
which matches exactly two digits.
Taking the true case first.
If the match succeeds, then the browser request included a two digit month number. Of course this number could be any two digit value e.g. 67, which is an unusual month.
in this case, we return $path_info_mo to $path_info_mo_num and we're finished with the line.
If the match fails then we consider the expression:
($month2num{ucfirst(lc $path_info_mo)} || undef)
Again we have the || operator used for flow control.
$month2num{ucfirst(lc $path_info_mo)}
Working from right to left, attempts to convert the value in $path_info_mo to lowercase
(lc $path_info_mo)
That string is then passed to the function ucfirst, which capitalizes the first character of the string
(ucfirst(lc $path_info_mo))
At this point, if the value of $path_info_mo is a string, we know the first letter will be capitalized and the rest of the string will be lowercase.
Now we use that string as a key in the %month2num hash we defined earlier, line 82.
The keys in %month2num look like 'Jan', 'Feb', 'Mar', etc.
If we find the key, then the expression
$month2num{ucfirst(lc $path_info_mo)}
evaluates to the value at the appropriate key in the hash; this value (a two digit month) is assigned to $path_info_mo_num and we're finished with the line.
If the key does not exist, because the value is anything other than 'Jan', 'Feb', ... 'Dec', then the expression evaluates to false and the second half of the || is considered, which results in undef being assigned to $path_info_mo_num.
Summary:
If $path_info_mo is either a two digit value or the name of a month in the form of 'Jan', 'Feb', ..., 'Dec', something that is a key in the month2num hash, then $path_info_mo_num is assigned an appropriate two digit value, otherwise it is undef.
For example:
http://sample.net/cgi-bin/blosxom.cgi/2006/12/31
or
http://sample.net/cgi-bin/blosxom.cgi/2006/Dec/31
would both result in $path_info_mo_num being assigned the value '12' at this line.
136. blank line
137. # Define standard template subroutine, plugin-overridable at Plugins: Template
138. 138. $template =
This looks like any other assignment statement at this point.
We'll see on the next line that what we're assigning to the variable is actually a reference to a subroutine.
139. sub {
The start of the anonymous subroutine that will serve as the default template routine
140. my ($path, $chunk, $flavour) = @_;
Most Perl subroutines start with a line like this naming the expected parameters.
Here we see that the $template subroutine expects 3 parameters
$path, which is the path corresponding to the browser request.
The script starts looking for templates files close to the requested file/directory and works up toward the root of the data directory.
$chunk, which is a particular piece of the template (i.e. one of 'date', 'content_type', 'head', 'story', 'foot').
$flavour, the requested $flavour.
Note that here we're declaring $flavour as a new lexical (my) variable, available inside this subroutine only.
This variable will mask the package global with the same name inside the subroutine's block.
General point:
We will often see the same variable name used more than once in the source. Rules of scope determine which variable, of potentially many with the same name, is used when a variable is referred to by name.
141. blank line
142. do {
Start of do/while loop
143. return join '', <$fh> if $fh->open("< $datadir/$path/$chunk.$flavour");
Because this line is part of the body of a do/while loop, it will be executed some number of times, as determined by the evaluation of the while condition that follows, line 144), but it must be executed at least once before the while condition is tested.
This line does a few of things we haven't seen before.
It is an example of a Perl expression modifier, which is simply a more compact way of writing a block of code, an if block in this case.
First we tell perl what we'll do (typically the body of an if block or while loop)
return join '', <$fh>
and next we specify that we'll execute the preceding expression only if the second part of the statement evaluates as true (this is the condition)
$fh->open("< $datadir/$path/$chunk.$flavour")
The evaluation of the first expression depends on the value (true of false) of the second, so let's start with the second expression.
What is the second part of the statement doing?
$fh->open("< $datadir/$path/$chunk.$flavour")
Is piecing together $path, $chunk, and $flavour with $datadir, the manually defined user configurable variable, to form a path leading to a particular template file.
The expression attempts to open this file for reading, and will return true if the request to open the file succeeds. It will return false otherwise.
If we're able to read the template file requested, then we read the entire file
<$fh>
Combines of the filehandle we just opened with Perl's line input operator '< >'.
In list context this returns all of the contents of the file as a list of values, where each list element is a line from the file.
We join the list together separated by the empty string '' (i.e. nothing), and return the resulting string to the caller.
Summary:
This line will run at least once (do/while) and attempts to open the file requested for reading. If successful, the line returns the contents of the specific file requested to the caller as a string.
144. } while ($path =~ s/(\/*[^\/]*)$// and $1);
Now we get to the condition that determines the number of times we'll run the body of the do block. Again, we know the loop will run at least once.
After the first run and before every subsequent run we evaluate the condition
$path =~ s/(\/*[^\/]*)$// and $1
The statement begins with another substitution.
$path =~ s/(\/*[^\/]*)$//
Let's replace the delimiters used her to make this easier to understand
s!(/*[^/]*)$!!
Note especially the presence of '$', which tells us that we're attempting to match at the end of the string value in $path.
What exactly does this pattern match?
Any number of '/' characters followed immediately by any number of any character other than '/'
'^', when included as the first element of a character class, negates the class.
Because we're substituting nothing (!!) we're dropping the matched portion of the value in $path.
So, each time we evaluate the condition we're dropping the last part of the $path passed to the function.
If this match succeeds, the second part of the statement is evaluated,
$1
which will be true only if the portion of the pattern contained within the parentheses matches something. This is necessary because as constructed the pattern match will succeed even if nothing is matched.
\/*[^\/]*$
Matches zero or more occurences of '/' followed by zero or more [^\/] (any character other than '/') but because we matching even on zero occurrences, the overall pattern match will always succeed.
However, $1 will only evaluate as true, if some portion of the original string matched the pattern within the parentheses. So, although the pattern match will succeed $1 will be empty, evaluate as false and allow us to drop out of the loop.
Summary:
We start looking for a template file at the end of the path passed to the function.
If ever we succeed in finding a template file and it can be successfully opened for reading, then we immediately return from the subroutine.
Until we find and succeed at opening the requested template we work up the provided $path, trying to open the requested template file at each successive level.
If we never succeed in opening the requested template file, we'll eventually drop out of the loop after exhausting $path completely, and failing the test condition of the while loop.
145. blank line
146. # Check for definedness, since flavour can be the empty string
147. if (defined $template{$flavour}{$chunk}) {
If the script executes this statement, we must not have been successful in finding the requested template file along the $path, or we would have returned from the subroutine by now.
Now what do we do?
We're looking for a requested template file and we haven't found what we're looking for along $path, but we're not out of options yet.
Here, we first try to find the requested template in a %template hash, which holds a set of templates baked-in to the script itself. (We haven't seen this hash defined yet -it's coming up shortly.)
%template is a hash of hashes keyed by flavour.
For ex %template may contain a number of templates, each a hash of key/value pairs, where each pair is the name of a template component (the key) and a reference to the contents of that component (the value).
For example
If we were not successful in finding a head.html template somewhere in the $path, next we check the %template hash for $template{html}{head}.
If $template{$flavour}{$chunk} exists (is defined), then we move on to the statements in the body of the if block -see the next line.
If this hash key is not defined, because there is no corresponding entry in the %template hash, then the body of the block is skipped and we continue.
148. return $template{$flavour}{$chunk};
In the case that If $template{$flavour}{$chunk} exists, this statement returns the value (the contents of the requested component from the backed-in templates) to the caller, and we're done with this call to the template routine.
149. } elsif (defined $template{error}{$chunk}) {
If the condition tested at line 146 fails, because there is no corresponding entry in the %template hash, then we move on to the elsif clause and evaluate its condition.
This expression again looks to the %template hash, but now we've given up on finding the requested flavour, and we're only looking for the specified component from a baked-in error template.
We're simply trying to return an error message specific to the component requested, indicating the the requested flavour is not available.
If $template{error}{$chunk} exists (i.e. is defined), then we move on to the statements in the body of the elsif block -see the next lines.
If even this fails, the body of the blocks is skipped and we continue. When will this happen? Only when the component requested ($chunk) is not one of the valid choices (head, story, foot, date, content_type).
150. return $template{error}{$chunk}
In the case that If $template{error}{$chunk} exists, this statement returns the value (the contents of the requested component from the backed-in error template) to the caller, and we're done with this call to the template routine.
151. } else {
End of the block that is the body of the previous elsif clause and the beginning of the default else clause. If none of the conditions tested in the preceding if and elsif blocks evaluate as true then the statements of this block will (definitely) be evaluated.
Of course, if any of the previous conditions succeeded then the statements here are not executed, in fact we've already returned from the $template routine.
152. return '';
If even $template{error}{$chunk} is not defined, then the component requested, $chunk, is not one of the valid choices (head, story, foot, date, content_type).
This statement returns the empty string ('') to the caller.
At this point we're sure to have returned from the subroutine. At the very least, we return nothing at all, but we do return.
153. }
End of the else block
154. };
End of the $template subroutine definition
Keep in mind that this is (finally) the end of the assignment state that began at line 138.
Summary
The subroutine expects three parameters ($path, $chunk, $flavour) and attempts to open the corresponding template file.
155. # Bring in the templates
156. %template = ();
Initialization of the %template hash.
It is initially empty.
157. while (<DATA>) {
We've seen while loops before, and this is just the beginning of another. On the other hand...
<DATA>
is new.
Quicky look at line 445.
You'll find a line that looks like this:
__DATA__
followed by a number of lines that specify various flavours, chunks, and text that look suspiciously like they might be the content of template components.
__DATA__ is a marker used as a pseudo-datafile.
It is used to define a group of lines treated by Perl as if they were contained in a file.
Perl can open this pseudo-datafile for reading like any other file (Of course it would make no sense to write to DATA!).
The while loop here works as if DATA were in fact an open filehandle.
The while loop will, at each evaluation of DATA, read in another of the lines following __DATA__ at line 445 and run the body of the loop.
158. last if /^(__END__)$/;
After reading a series of lines, one at a time, and working through the body of the while loop, we'll eventually encounter
__END__
at line 461 (go take a look), at which point this statement drops out of the while loop.
Specifically, the line tries to match the current line to the pattern ^(END)$ which specifies,
This is the second expression modifier we've encountered.
The expression
last
is evaluated only if the pattern matches.
If you're confused, just realize that this line could be rewritten as:
if(/^(__END__)$/) {
last;
}
When evaluated, last drops out of the loop.
159. my($ct, $comp, $txt) = /^(\S+)\s(\S+)(?:\s(.*))?$/ or next;
We know we're working with one line below the __DATA__ marker at line 456, and above __END__ at line 527.
If we get past line 158 then we know the current line is not yet
__END__
Notice the three variables to the left of the assignment statement, and the three sets of parentheses in the pattern.
We're attempting to assign each parenthesized portion of the matched line to one the variables.
^(\S+)\s(\S+)\s(.*)$
Literally this pattern is
So if, for the moment, we define a word as a sequence of nonspace characters then we're trying to capture:
If we look at that __DATA__ section we see:
$ct will be one of 'html', 'rss', or 'error'
$comp will be one of 'content_type', 'head', 'story', 'date', or 'foot'
$txt will be the rest of the line, essentially the complete text that makes up one portion of one of the baked-in template files.
$ct is essentially a flavour (e.g. html),
$comp, a component (chunk) of one of the flavours (e.g. head), and
$txt the contents of one chunk of one of the flavour, specifically the text of component $comp of flavour $ct (e.g. the contexts of the baked-in head.html).
Notice that there is no conditional here.
We know these matches will succeed for as long as the while loop continues because the lines we're working with have been constructed this way. We're not making an external call to the system or depending on user input, either or which would give us cause to doubt the format of the strings.
The last bit of the statement is:
or next
This is the logical operator used here to as a partial evaluation op. The first expression
my($ct, $comp, $txt) = /^(\S+)\s(\S+)(?:\s(.*))?$/
will evaluate as true as long as we match at least one of the parenthesized patterns. The return value of the expression is the total number of matches. If all three variables are assigned then the statement evaluate to 3. If one of the pathesized patterns fails to match in the string then the return value is 2. If all three fail, the return value is 0 and we evaluate the rest of the statement
next
which skips the rest of the while loop and immediately continues with the next line at the top of the loop. If we haven't matched then we don't want to continue processing the current line.
Note that this shouldn't be much of an issue unless there is a mistake in the __DATA__ section of the source file.
160. $txt =~ s/\\n/\n/mg;
A simple substitution, not unlike the others we've seen, with a small but significant exception.
Here we're replacing each occurrence of the pattern \n with the string '\n'.
The goal is to replace escaped newline sequences "\n" with recognized newline sequences "\n" (we're unescaping the newlines).
There are two modifiers used:
g - We've already seen that this modifier makes the substitution global, so that it matches every occurrence of the pattern in the string, not just the first.
m - This one is new. Normally patterns match against the entire string, for example '^' and '$' refer to the beginning and the end of the entire string, not individual lines within the string. Because $txt is a string composed of multiple lines this could be a problem.
The m modifier tells Perl's regex engine to consider internal lines (e.g. matches against boundary anchors at the beginning and end of internal lines), and not just the beginning and end of the entire string.
161. $template{$ct}{$comp} .= $txt . "\n";
Using the three variables we've just assigned, this statement goes about the business of building the %template hash structure -see discussion of line 146.
Later we'll be able to default to these baked-in templates by referring to the hash.
When looking for $chunk, and $flavour in the $template subroutine we'll be able to use those variables to get at the contents of the %template hash we're building up here.
$template{$flavour}{$chunk} in the $template routine is equivalent to $template{$ct}{$comp} here.
To the value at {$ct}{$comp} we append $txt (concatenation) and end the value with a newline.
162. }
End of while loop started at line 157.
163. blank line
164. # Plugins: Start
165. if ( $plugin_dir and opendir PLUGINS, $plugin_dir ) {
Start of conditional block
Before we can start to work with plugins, this line checks that the $plugin_dir has been specified ($plugin_dir is a user configurable variable and is initially the empty string, ''
If it has has any value, whether or not the value corresponds to a valid path, $plugin_dir evaluates as true value.
Next, the script attempts to open the plugin directory, associating it with the directory handle PLUGINS
opendir PLUGINS, $plugin_dir
Only if $plugins_dir is not '' and if we are able to successfully open the directory, do we executed the code in the body of the conditional.
If one of the two expressions fails, then we skip the block entirely but continue with the execution of the script.
166. foreach my $plugin ( grep { /^\w+$/ && -f "$plugin_dir/$_" } sort readdir(PLUGINS) ) {
Start of foreach loop and condition
This loop makes up the bulk of the conditional block. Let's take it a piece at a time, working from right to left
readdir(PLUGINS)
reads the contents of the directory returning all items found in the directory in the same order you would get from running ls -f on the directory, including dot files and subdirectories.
sort readdir(PLUGINS)
This list is sorted in ascending ASCIIbetical order (the default for sort on ASCII strings).
The sorted list is handed over to grep, which picks items from the list according to the expression following the operator in {},
grep { /^\w+$/ && -f "$plugin_dir/$_" } sort readdir(PLUGINS)
/^\w+$/ && -f "$plugin_dir/$_".
The sorted entries get past grep (to be included in the list returned from grep) if the item satisfies two conditions (&&)
It must match the pattern /^\w+$/,
which specifies
So, we're looking only for entries that contain letters, digits, and '_' characters (i.e. characters that match \w).
This eliminates any dot files, along with other odd filenames.
This correctly implies that plugin names should only include digits, letters, and underscores.
-f "$plugin_dir/$_
This second condition specifies that the item must be a file.
-f is Perl's file test operator, it returns true only if the string it's looking at specifies a file (as opposed to directories and other filesystem item types).
It's important to note that readdir returns only the names of the directory entries, not including path information.
Because of this, we prepend to each plugin name the path to the plugins directory that the user defined as $plugin_dir.
Compare
$plugin_dir/$_, which will be something like
'/path/to/plugin/dir/file_name'
and
$_
which is just 'file_name'
-f filename would look for filename in the current working directory, which is the directory containing the blosxom.cgi when the script starts.
This ensures that -f is looking in the right place for the names we're testing.
It's the difference between:
-f "interpolate_fancy"
which will fail, even if the the interpolate_fancy plugin exists in the correct plugin directory, and
-f "/Library/WebServer/Data/Blosxom/plugins/interpolate_fancy"
Also note the use of $_,
which is a default variable name often used by Perl. In this case $_ is set to the value of each item being looked at by grep in turn.
grep modifies the list it is given (it will at the very least drop '.' and '..' from the list), and the foreach loop is run on the modified list.
foreach is another loop control structure.
The variable $plugin is assigned each value from the list returned from grep, and the block is run repeatedly until the list is exhausted.
So we see that the block is running against each plugin found in $plugin_dir.
167. next if ($plugin =~ /~$/); # Ignore emacs backups
This statement causes the loop to skip all files that match the pattern ~$
You should be able to recognize this statement as an expression modifer. First we test the condition
$plugin =~ /~$/)
and evaluate the expression
next
only if the condition is true.
$plugin =~ /~$/) is true if the string in $plugin matches the pattern $~, which specifies
next
skips the evaluation of the rest of the statements in the foreach loop and immediately continues at the top of the loop with the next $plugin.
Note that '~' is matched by \w so it is necessary to specifically skip these files if we are concerned that there will be files ending with ~ in $plugin_dir and we do not want the script to treat these files as plugins.
168. my($plugin_name, $off) = $plugin =~ /^\d*(\w+?)(_?)$/;
$plugin is the name of one of our plugins.
This line matchs against that name and using parentheses in the pattern assigns portions of the matched string to the two variables $plugin_name and $off
^\d*(\w+?)(_?)$
Matches:
The question mark (?) indicates that this portion of the pattern is optional. The question mark is a quantifier instructing the regex engine to match zero or one occurrence of the preceding character.
The portion of the string matching \w+? is assigned to the variable $plugin_name.
The portion matching _? is assigned to the variable $off.
Note: because these portions of the pattern are optional, either of these variables may be assigned the empty string.
For example, maybe we're using the interpolate_fancy plugin. Then that plugin will be included in the $plugin_dir, maybe with the name
interpolate_fancy
If you've read the project documentation then you'll know that we can enforce strict ordering by prepending plugin names with numbers. These numbers are matched by the pattern but not assigned to $plugin_name. This correctly implies that plugin names should not begin with digits. (This is covered in the documentation).
We'll keep reading to discover the significance of $off, but if you've read the documentation you might be able to guess. After the pattern match, $off will contain either '_' or the empty string, ' '.
From the documentation we know that we can disable a plugin by appeaning '_' to the plugin's name.
$off will contain '_' if the plugin name ends in '_' and it will contain ' ' otherwise.
169. my $on_off = $off eq '_' ? -1 : 1;
Based on the value of $off, which will be either '_' or the empty string, ' ', $on_off is assigned -1 if $off is '_' and '1' otherwise.
We know from the documentation that '_', when appended to the end of a plugin name, is used to indicate that a plugin should be treated as inactive.
So if the plugin is to be treated as inactive, $off is '_', and $on_off has the value '-1'.
170. require "$plugin_dir/$plugin";
require
Loads in external functions from a library, $plugin_dir/$plugin in this case. After this line we can refer to functions defined in the plugin file.
171. $plugin_name->start() and ( $plugins{$plugin_name} = $on_off ) and push @plugins, $plugin_name;
You should recognize that this line is yet another example of the use of a partial evaluation operator, this time connecting three subexpressions.
Reminder: Such statements are evaluated from left to right.
'and' is a short-circuited logical operator, meaning that evaluation of the entire statement ends as soon as the truth of falsehood of the statement is known.
Because 'and' requires all subexpressions to be true, we must evaluate all of the expressions to determine that the statement is true, but can determine that the entire statement is false as soon as we encounter a single false expression.
We can order expressions in such a way that we can depend on values in later subexpressions, if we confirm those values in earlier ones, because if the earlier statement were not true, subsequent expressions will be harmlessly ignored.
This type of partial evaluation expression is very heavily used by blosxom.
These statements can be confusing, so be careful.
OK, enough of that, what is this statement saying?
Looking at it one expression at a time from left to right:
$plugin_name->start()
We call the plugin's start() subroutine. We can do this because of the presence of the require statement above.
We know from the documentation that this routine must return 1, a true value, to inform blosxom that it should consider the plugin active.
From the documentation
The start subroutine is required.
Its purpose is to Blosxom know that it has indeed loaded a plugin and should consider it active.
Inform Blosxom so by returning a 1 (true),
as shown in this simplest of possible examples:
sub start { 1; }This lets Blosxom know that it should consider the plugin alive and well and should offer it the ability to act at each upcoming callback point.
If $plugin_name->start() does not evaluate as true, then we skip the rest of this long statement.
Assuming $plugin_name->start() is true, we continue evaluating the statement
and ( $plugins{$plugin_name} = $on_off )
Here we create a key/value pair in the %plugins hash, assigning the value of %on_ff (either -1 or 1) to the key $plugin_name, which appropriately enough is the name of the current plugin.
The value of $on_off is determined in line 161.
We can see now that the %plugins hash is keyed by $plugin_name and stores status information of all plugins in $plugin_dir (namely the $on_off value).
Note that disabled plugins (plugins with name ending in an underscore, set by the user) are included in the hash but inactive plugins (those for which $plugin_name->start() does not evaluate as true) are not.
If plugin->start() does not return true, we'll never make it to this expression.
Also keep in mind that just because a plugin is represented in this hash, it does not mean the plugin is active. $on_off indicates that the user has disabled it.
push @plugins, $plugin_name
Finally, assuming the other two expressions are true, we push the current plugin onto the end of the @plugins array.
@plugins is apparently a list of all valid plugins for which plugin->start() is true.
This list may include plugins disabled by the user.
Notice that this line is essentially an oddly written if block.
If the first expression is true, then we take two actions as defined by the 2nd and 3rd subexpression.
This could, and probably should, be rewritten as an if clause without being made any less efficient.
172. }
End of foreach loop started at line 166.
173. closedir PLUGINS;
Closes the directory handle after running through the plugins directory completely.
174. }
End of if block started at line 165.
Keep in mind that we skipped this entire block if $plugin_dir is not defined by the user.
175. blank line
176. # Plugins: Template
177. # Allow for the first encountered plugin::template subroutine to override the
178. # default built-in template subroutine
179. my $tmp; foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can('template') and defined($tmp = $plugin->template()) and $template = $tmp and last; }
This is an ugly line. We'll take it in pieces.
The line starts with the statement
my $tmp;
which declares the local variable $tmp.
Next we see
foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can('template') and defined($tmp = $plugin->template()) and $template = $tmp and last; }
This is a foreach loop collapsed to a single line. It will look more familiar to you rewritten as
foreach my $plugin ( @plugins ) {
$plugins{$plugin} > 0 and $plugin->can('template') and
defined($tmp = $plugin->template()) and $template = $tmp
and last;
}
foreach my $plugin ( @plugins ) {
This is the start of the foreach loop which runs through all of the plugins listed in @plugins. These are the active plugins, in the sense that $plugin->start() returned true, but this list does possibly include plugins disabled by the user (e.g interpolate_fancy_).
Each is set to $plugin in turn.
$plugins{$plugin} > 0 and $plugin->can('template') and
defined($tmp = $plugin->template()) and $template = $tmp
and last;
Yet another long 'and'ed partial eval statement.
Let's take it piece by piece, from left to right.
$plugins{$plugin} > 0
Remember that %plugins contains $on_off values for active plugins where '1' means on and '-1' means off.
We only continue considering the current plugin if it has not been disabled by the user.
If the value at $plugins{$plugin} is <= 0 we skip the rest of the statement, with the result that we do nothing with the plugin, and nothing is exactly what we want to do with disabled plugins.
If $plugin has not been disabled by the user $plugins{$plugin} > 0 will be true and we continue with the statement.
and $plugin->can('template')
Here we're testing to see if the current plugin has a template subroutine.
If the return from the can method is true, then $plugin claims to have a template() method, which means we should be able to safely refer to $plugin->template().
from perldoc
$obj->can(METHOD)
can checks if the object or class has a method called method. If it does then a reference to the sub is returned. If it does not then undef is returned.
and defined($tmp = $plugin->template())
In this line we assign the reference to the anonymous subroutine returned by $plugin->template() to $tmp and then check that $tmp has a defined value.
and $template = $tmp
Next, we assign the value from $tmp to $template, overwriting the reference to the default template subroutine that was in $template before now.
Why use $tmp at all?
Because we do not want to overwrite $template until we are pretty sure we have a valid (defined) reference to a new template subroutine.
Once we know that $tmp is defined, we can pass its value to $template.
and last
Finally, the last operator drops us out of the foreach loop.
This means that we will stop looking for replacement template routines after the first one we find.
This is consistent with what the documentation tells us:
The first encountered plugin::template subroutine overrides the default.
180. blank line
181. # Provide backward compatibility for Blosxom < 2.0rc1 plug-ins
I'm skipping the next three lines because they only provide compatibility with previous versions of blosxom.
Even the newest blosxom code is old at this point. Everyone has had plenty of time (many years) to upgrade.
No one should be using pre 2.0rc1 versions of blosxom. Upgrade either 2.0 or 2.0.2. There is really no reason not to use 2.0.2.
182. sub load_template {
See notes at line 181.
183. return &$template(@_);
See notes at line 181.
184. }
See notes at line 181.
185. blank line
186. # Define default entries subroutine
187. $entries =
This looks like any other assignment statement at this point.
We'll see on the next line that what we're assigning to the variable is actually a reference to a subroutine.
We're assigning $entries a reference to the anonymous subroutine that follows, which is the baked in entries routine.
We have seen this once before, line 138, and we will see this same pattern again each time we set the default for one of: start (line 138), entries (line 187), head (line 328), sort (line 350), date (line 385), story (line 401), foot (line 423), filter (not defined in blosxom.cgi but available to plugins -see documentation), and end (not defined in blosxom.cgi but available to plugins -see documentation)
188. sub {
The start of the anonymous subroutine that will serve as the default entries routine.
189. my(%files, %indexes, %others);
Here we declared three lexical variables: %files, %indexes, %others
We'll discuss each of these as they're used.
The only thing we know about them now is that they are all hashes.
190. find(
This is the beginning of a call to the File::Find library's find routine.
find(\&wanted, @directories_to_search);
find expects two arguments.
The first is a reference to a subroutine that is to be run against every file and directory found as Perl recurses through the directory structure starting @directories_to_search.
What follows this line is the definition of the subroutine passed to find as \&wanted.
Jumping ahead in the code, we can see the second argument to find on line 228 is $datadir
This makes sense, and is probably what you'd expect. We're looking for entries so we start at $datadir.
Some important things to keep in mind about the operation of find() -see man File::Find for more info.
------ From the documentation
"find()" does a depth-first search over the given @directories in the order they are given.
For each file or directory found, it calls the &wanted subroutine.
Additionally, for each directory found, it will "chdir()" into that directory and continue the search, invoking the &wanted function on each file or subdirectory in the directory.
The wanted function takes no arguments but rather does its work through a collection of variables.
- $File::Find::dir is the current directory name,
- $_ is the current filename within that directory
- $File::Find::name is the complete pathname to the file.
Don't modify these variables.
191. sub {
This is the start of the definition of an anonymous subroutine that will serve as &wanted, a reference to which is passed to find().
See the notes for line 190 for more details.
Note that we could have defined this function elsewhere and simply passed a reference to it here, which would have avoided spreading the call to find() over nearly 40 lines.
192. my $d;
A simple variable declaration.
As we'll see, the variable holds the date string returned from nice_date(), and is used as both key and value for the %index hash.
193. my $curr_depth = $File::Find::dir =~ tr[/][];
$File::Find::dir
is the complete directory name of the current directory, from the root, as the find routine is making it's way through the $datadir hierarchy.
$File::Find::dir =~ tr[/]
We're counting the number of forward slashes, '/', (path delimiters) in the complete name and storing that value in $curr_depth.
We've seen this use of tr[/] before in line 104. See the notes at line 104 for more information.
194. 189. return if $depth and $curr_depth > $depth;
We've seen expression modifiers like this before.
Essentially this is an if statement with the condition
$depth and $curr_depth > $depth
The expression $depth evaluates to true if it is anything other than zero.
Remember that $depth is one of our user configurable variables. A value of zero indicates infinite depth.
If $depth is zero, we want blosxom to consider all directories under the requested directory.
In this case, because 0 evaluates as false, we do not evaluate the second part of the condition which compares $curr_depth and #depth.
If $depth is not 0 then do evaluate the second half of the condition.
and $curr_depth > $depth
This is true if $curr_depth is greater than the user configurable $depth value.
For example:
If the user has requested that blosxom consider only posts no more than 3 levels under the requested directory ($depth = 3), we should stop if $curr_depth is 4.
If both $depth and $curr_depth > $depth are true we immediately return from the function at this point, otherwise we continue.
What does all this mean?
find() will run this function on every item below $data_dir but we want to give users the option of limiting the script to a maximum level under the requested directory. To accomplish this we cut the function should for every call after we have exceeded $depth, so that we essentially ignore all posts below $depth.
195. blank line
196. if (
Start of conditional block.
The condition here will define what blosxom considers a post, which is, as we will see:
See lines 197 - 200 for more info
197. # a match
198. $File::Find::name =~ m!^$datadir/(?:(.*)/)?(.+)\.$file_extension$!
$File::Find::name is the complete path to the current file,
e.g. /some/path/foo.ext
In this line we are doing nothing more than attempting a match on $File::Find::name
m!^$datadir/(?:(.*)/)?(.+)\.$file_extension$!
This pattern matches:
What about (?:
It instructs Perl's regex engine that this set of parentheses should be used for grouping only, and should not trigger the creation of a match variable.
The parentheses here, (.), do trigger a match variable, and because the previous set of parens included ?:, the portion of the string matched by the pattern . is assigned to the variable $1.
?: is intended to be used precisely for this purpose, to avoid creating unnecessary variables.
In summary, this portion of the condition is satisfied if the complete path to the current file:
Important:
The quantifiers used by Perl's regex engine are greedy, so (?:(.*)/) will match as much of the string as possible.
Because .* matches any number of any character, this may include internal '/' characters.
e.g., given
$datadir/some_dir/sub_dir/sub_sub_dir/filename.extension
(?:(.*)/) will match all of the substring
'some_dir/sub_dir/sub_sub_dir/'
The two memory variables created contain:
$1 - The portion of the path after $datadir, not including a leading and trailing delimiter ('/'), and not including the filename.
$2 - The current filename, not including the $file_extension.
199. # not an index, .file, and is readable
200. and $2 ne 'index' and $2 !~ /^\./ and (-r $File::Find::name)
This line continues the conditional expression. started at line 198
and $2 ne 'index'
Assuming we've gotten to this point $2 contains a filename not including the .$file_extension.
This portion of the expression returns true if the value of $2 is anything other than 'index'.
So find(), and blosxom, skip any index files encountered.
Assuming $2 is not 'index' then we continue with the condition
and $2 !~ /^\./
This portion of the conditional is again an attempt to match against $2,
the pattern ^., specifies
This part of the condition succeeds only if the match fails (compare !~ and =~)
So find(), and blosxom, skip over dot files, (because we do not match if the filename begins with a dot).
Assuming $2 is does not begin with a dot, '.' we continue evaluating the condition
and (-r $File::Find::name)
Finally, this portion of the condition checks that the current file is readable.
find(), and blosxom, skip any non-readable files, (because we do not match if the file is not readable).
201. ) {
End of condition expression that started on line 196, and the beginning of the conditional block.
To be absolutely clear, we match:
202. blank line
203. # to show or not to show future entries
204. (
This is the beginning of a long statement spread out over lines 204 to 222.
The purpose of this statement is to determine whether the file currently being considered, which is $File::Find::name, should be output.
Note that there is no semicolon at the end of the statement (line 222).
Why?
Because a semicolon is not technically required for the last statement in a block. Use of a semicolon is still generally encouraged. Especially for a statement like this, spread out over 18 lines.
205. $show_future_entries
The first subexpression is spread out over these 4 lines, 204 - 207.
Because this is part of a statement connected by logical operators, evaluation starts on the left and continues, or stops, as directed by the values of the subexpressions, and the logical operators we encounter.
Notice that this subexpression is itself composed of the 'or' of two subexpressions.
The expression evaluates to true if one, the other, or both of the subexpressions are true.
Also, as we've seen before, evaluation stops as soon as we can determine the truth or falsehood of the expression. We may not evaluate all of the subexpressions.
$show_future_entries is the user configurable variable defined at line 42.
A value of 0 indicates that the user does not want to show future entries (posted-dated entries, or entries with modification times occuring at some point in the future relative to now).
A value of 1 indicates a preference to display future entries.
If the value is 1, a true value, then we're done with subexpression, which also evaluates as true, and pick up with the next starting at line 210.
Otherwise we continue to the second subexpression...
206. or stat($File::Find::name)->mtime < time
Here we compare the modication time on the file currently being considered,
stat($File::Find::name)->mtime,
to the present time, returned by the Perl function time.
Conveniently, times returned by stat and time are in the same format (or they would not be directly comparable). Both return a value in Unix timestamp format, which is (roughly) the number of seconds since Epoch, an easy value for us to work with.
If none of that sounds familiar to you, don't worry about it too much. For our purpposes it means that the number returned is a simple integer value that's perfect for comparisons like this. If I'm comparing the time now to time from a week ago, then time now will be a larger number because some number of seconds will have passed over the week. More specifically, time now will be time_a_week_ago + 604800 (the number of seconds in a week).
If the modification time is less than (<) the present time, that is to say if the modification time is in the past, then the subexpression returns 1, a true value, and the subexpression (line 204 - 207) is true.
At this point we've seen the first subexpression and determined its truth or falseness.
The subexpression is connected to the rest of the statement (continuing to line 222) by the logical operator 'and'.
Because all expressions connected by 'and' must be true, if this first subexpression returns false, then we are done with the entire statement that runs from 204 - 222.
Otherwise, we continue evaluating the rest statement.
To summarize the first subexpression
First we check the value of $show_future_entries, because if the user has instructed Perl to display posts with future modification times then the modication time of the file is unimportant here.
On the other hand, if future entries should not be output then we must compare the modification time on the file to the current time and we cut short the entries routine if the file has a future modification time.
207. )
End of the expression that started at line 204, which is part of the statement that runs to line 222.
208. blank line
209. # add the file and its associated mtime to the list of files
210. and $files{$File::Find::name} = stat($File::Find::name)->mtime
This line should always evaluate as true.
In fact this (the entire statement) is yet another example of the use of partial evaluation operators to control the execution of the script and would be written more traditionally as a conditional block. I would go so far as to say that this is a prticularly good example of the problem you get into when overusing these sorts of constructions.
Anyway, remember that we have declared a local %files hash (scoped to this subroutine) at line 184.
Here we get our first look at what we'll be doing with that hash.
The %files hash is collection of pairs where each key is a filename (the name of a file which has not been eliminated from consideration as a post) and the corresponding value is the file's current modification time in unix timestamp format, as returned from the call to stat.
This expression stores the modication time value keyed by the filename and we continue on in the statement.
211. blank line
212. # static rendering bits
213. and (
Start of the next subexpression that runs to line 217.
The next bit of the statement is spread out over these lines 213 - 217.
This subexpression is itself composed of 3 subexpressions. We'll consider each in turn.
Note that because these are connected with the logical operator 'or' the expression is true as soon as any of the three subexpressions is determined to be true.
As the comment in blosxom.cgi explains, this expression deals with static rendering.
214. param('-all')
From the documentation
"To force Blosxom to regenerate all pages, add another command-line switch, -all=1 , like so:
% perl blosxom.cgi -password='whateveryourpassword' -all=1
So if param('-all') is true then blosxom should generate all pages.
If this is determined to be true, the evaluation of this expression is complete.
Otherwise we continue...
215. or !-f "$static_dir/$1/index." . $static_flavours[0]
Note that the variable $1 refers to the match made at line 198.
From that match
$1 is the portion of the path after $datadir, not including a leading and trailing delimiter ('/'), and not including the filename.
$static_flavours[0] refers to the first element of the user configurable @static_flavours array defined at line 61.
For example if the array were defined as follows (the default)
@static_flavours = qw/html rss/;
then $static_flavours[0] is 'html'.
This string ($static_flavours[0]) is concatenated to the path $static_dir/$1/index to give us the complete path to an index page for a requested flavour in the user-configured $static_dir directory.
We check to see if this file exists.
-f is a test to determine if it is a file, it may be preferrable to simply check for existance (-e) instead.
Note the use of the negation operator (!). Because of it's presence, the expression evaluates to true if not -f, or if "$static_dir/$1/index." . $static_flavours[0] is not a file.
If an index for for the first @static_flavor corresponding to the current path in $datadir does not exist (is not a file) at the corresponding location in $static_dir then this expression is true.
Otherwise, we continue...
216. or stat("$static_dir/$1/index." . $static_flavours[0])->mtime < stat($File::Find::name)->mtime
This is similar to the sort of time comparison we saw at line 206, but in this case we're comparing the modification time of the static 'index.' file - created sometime before now
(as discussed in the note at line 210), to the modication time of the file currently being processed.
The less than operator (<) evaluates to true if the static index file is older than the modification time of the current file.
Keep in mind that we know the statically generated index exists because we must have failed the previous file test (!-f) if we are considering this part of the expression.
It should make sense to you that if any current file was modified after (is newer than) the static index page (which must include the current file) then we need to recreate the static file to include the changes.
Otherwise, nothing has changed, as far as the current file is concerned.
If this expression is false, we are finished with our long statement (lines 204 - 222), which is determined to be false.
Otherwise, if this portion of the statement evaluates to true, we continue to the next subexpression.
217. )
End of the expression that started at line 213, which is part of the statement that runs to line 222.
218. and $indexes{$1} = 1
We declared the index hash on line 189 (with %files and %others). Here is the first mention of the hash since that declaration.
This portion of the the statement should always evaluate as true.
If the entire statement were rewritten as a conditional (it might be easier to read that way), then this would be a statement in the body and not a part of the condition.
We're simply creating a new key/value pair in the %indexes hash.
Our hash key in this case is $1.
Remember from line 198 that $1 is the portion of the path after $datadir not including a leading and trailing delimiter ('/'), and not including any filename.
And the value is simply '1'.
So what is this expression and %indexes hash doing?
If we must create a static index page for the directory containing the current file, then we indicate this by storing a value of '1' in the %indexes hash, keyed by the name of the directory itself.
Later, we can use this hash to determine what index pages we need to generate.
219. and $d = join('/', (nice_date($files{$File::Find::name}))[5,2,3])
...continuing with our long statement
Assuming we've made it this far, and we won't have if any of the preceding expressions were false, then we retrieve from the %files hash the previously saved modification time (line 210) for the current file.
$files{$File::Find::name}
and pass that value to the function nice_date(), which is defined in the blosxom.cgi itself (lines 433 - 442).
nice_date($files{$File::Find::name})
We'll talk about how nicedate() when we get there. At this point all we need to know about nicedate() is that
We pass it just the sort of value we've previously stored in our %files hash,...
namely, the mtime value returned from a call to stat(), which is an integer representing the number of seconds elapsed since midnight UTC on the morning of January 1, 1970 (referred to as the epoch).
...and it returns a list of values corresponding to the following variables
$dw, $mo, $mo_num, $da, $ti, $yr
From this list we grab the 5th, 2nd, and 3rd values (in that order)
(nice_date($files{$File::Find::name}))[5,2,3]
and join them as a single string, separating each by the delimiter '/'.
join('/', (nice_date($files{$File::Find::name}))[5,2,3])
e.g.
Given the values above we would end up with '2006/11/24'
We store this value at $d (declared at line 192).
$d = join('/', (nice_date($files{$File::Find::name}))[5,2,3])
220. blank line
221. and $indexes{$d} = $d
We just saw use of the %indexes hash on line 218.
In that case we were storing the following key/value pairs:
keys: directory paths, eg 'Technology/Computer/Apple'
value: 1, a status flag indicating that we need to generate an index page for the directory named by the key.
Here we are using the same hash to store the following pairs:
keys: The date strings temporarily stored at $d (line 219), generated from the list of values returned from nice_date (line 219)
values: The value is that same $d value,
For example:
the key $indexes{'2006/11/24'} has the value '2006/11/24'.
It seems that the purpose of the %indexes hash in both cases is to keep track of the static index pages we'll need to generate during static mode operation.
Remember that Blosxom allows both categories and a date-based archive schemes.
We'll need to create indexes for both when statically rendering the site.
222. and $static_entries and $indexes{ ($1 ? "$1/" : '') . "$2.$file_extension" } = 1
We've finally arrived at the last line in our long (long) statement.
This is the final subexpression that completes the picture of the statement we've built up starting with line 204.
Remember, that we'll only get to this point in the statement if all of the previous expressions evaluate as true.
Assuming that is the case, $static_entries is a user configurable variable, a switch indicating a preference to generate static files for each individual post (1) or not (0).
What does that mean?
To this point we've only been dealing with index files.
For each directory in the category hierarchy and each level of date-based archive scheme, we generate a single file containing all of the posts that belong in that directory or date range.
Just as blosxom is capable of generating pages to specific posts when running dynamically, we can generate pages for specific posts in static mode.
Why wouldn't you want to do this in static mode? If you have a lot posts, then this will generate many, many files. Specifically, one file per post, per category/date, per static flavor.
Realize that this may mean that a single post is replicated many times if it occurs in a deeply nested category and also once each for the year, month and day that the date-based scheme requires.
and $static_entries
If this variable ($static_entries) is assigned the value 1, then we continue past the first part of this expression and continue evaluating the statement.
The next part of this expression isn't particularly easy to read.
It uses the ternary operator, the memory variables $1 and $2 from line 198, and the %indexes hash again.
Here's how it works:
and $indexes{ ($1 ? "$1/" : '') . "$2.$file_extension" } = 1
We will be storing the value of 1 somewhere in the %indexes hash.
At what key?
The ternary operator gives us two possibilities.
$1 is evaluated.
When defined at line 198 this memory variable was part of an optional component in the regular expression.
It contains either the path after $datadir not including a leading and trailing delimiter, '/', and not including the filename or it contains the empty string, ''.
In the first case, we evaluate the string "$1/" which is simply whatever was the value is at $1 with the addition of a trailing forward slash, (/).
Remember that if $1 is anything other than the empty string, it is the the path starting at $datadir to the requested post. Here we append a forward slash as a delimiter. (We know it's not already present because it was stripped when we created the variable at line 198.)
In the second case $1 is the empty string and we evaluate '', which is simply ''. This will be the case if there is no path between $datadir and the requested post.
In other words, $1 will be the empty string if the requested post is at the root of blosxom's data directory.
Either way we concatenate this with $2.$file_extension and this is our key.
Remember that $2, again from line 198, is the name of the requested post, without the file extension.
We append a literal dot (.) and the value of $file_extension, the user configurable variable set at line 37.
So, the key in this case is the complete path from the root of the data directory to the requested file including the filename and extension.
The value is 1.
This is a third type of key in %indexes.
There are
At this point we can use the %indexes hash as a 'list of ingredients', telling us what to generate when in running static mode.
223. blank line
224. }
End if block started at line 201.
225. else {
Beginning of else clause that pairs with the if starting at line 196.
226. !-d $File::Find::name and -r $File::Find::name and $others{$File::Find::name} = stat($File::Find::name)->mtime
This statement is saying:
If the requested name is not a directory, and it is readable, then store the modification time in the %others hash keyed by the file name.
If you look back at the if condition at lines 198 - 200 you'll see that we were looking for:
$File::Find::name =~ m!^$datadir/(?:(.*)/)?(.+)\.$file_extension$!
# not an index, .file, and is readable
and $2 ne 'index' and $2 !~ /^\./ and (-r $File::Find::name)
If any of those conditions fail we end up here, evaluating the else clause, and we do not evaluate the else clause if all of those conditions are met.
Assuming that one of those tests fails, and keep in mind that we won't we won't know which one has failed, what do we do next?
Let's look at the statement one expression at a time.
You'll recognize this as another statement composed of multiple partial evaluation operators.
!-d $File::Find::name
If the item ($File::Find::name) is not a directory.
-d is the directory test and so if we negate that, with the negation operator, (!), we evaluate to true if $File::Find::name is not a directory.
and -r $File::Find::name
If we return true for the first expression, then we evaluate this one, which is true if $File::Find::name is readable, -r
and $others{$File::Find::name} = stat($File::Find::name)->mtime
Finally, this expressi