deekayen.net
Bot filtering in Apache
Some people filter robots, spiders, and web crawlers to only allow a few with robots.txt. I prefer to filter email collectors I know have a bad reputation. Since not every bot bothers to even check for robots.txt, I use the Limit restriction to ban certain bots from deekayen.net using their user agent string. The smart bot creators allow users to change the user agent string, so this method isn't foolproof, but it lets me sleep better at night.
Uniform website URLs using mod_rewrite
While some people might type in "www.deekayen.net", others, like myself, don't type "www." anymore. Besides, telling someone your website address, starting with "www." is clumsy. Some people even leave off a W to say "ww.example.com". Worse yet, I've seen people actually listen and only type two w's. I use an Apache mod_rewrite rule to send a permanent redirect to users who type the "www.".
RewriteCond %{HTTP_HOST} !^deekayen.net [NC]
RewriteRule ^/?(.*) http://deekayen.net/$1 [R=permanent,L]The first translates to "if the URL you are requesting doesn't start with deekayen.net," followed by the second line triggered by the first, "then you must load http://deekayen.net/ instead. $1 will include the filename you requested. This is a permanent thing, so get used to it, because otherwise I'll keep redirecting you."
This rule can also be handy if you have multiple domain aliases for the same website. For example, I don't tell anyone about deekayen.com, but www.deekayen.com and deekayen.com both redirect under this rule to deekayen.net.
Drupal cron.php restriction
I recently made my first cron contrib module for Drupal, DB Maintenance. Since the DB Maintenance OPTIMIZE TABLE query locks the database tables it queries, I don't want just anyone to access cron.php anymore. The restriction I added was for the Apache .htaccess file that manages the clean URL rewrite rule.
<Files cron.php>
Order deny,allow
Allow from 207.7.108.211 127.0.0.1
Deny from all
</Files>207.7.108.211 is the current IP address of deekayen.net, which is needed instead of 127.0.0.1 when you run cron.php with lynx or wget as the documentation strongly suggests, which means 127.0.0.1 isn't the remote IP when Apache receives the request. I only put in 127.0.0.1 so in the future, if I need to access from localhost for some reason, I can.
new! atom feed for blogs
In closing the feature request to add Atom feed options for blogs in Drupal, I added an option to select how many entries to display in feeds, between 1 and 30 inclusive. Someone also wanted to be able to put HTML in their site slogan, which broke Atom compliance, so I fixed that, too.
Drupal 4.6 ad sprinkling
I had been wondering how sites sprinkle ads in irregular parts of their index page. As it turns out, it's just a matter of a simple if statement I put at the end of my Drupal node.tpl.php template file.
<?php if (!$page && ($seqid == 2 || $seqid == 6) && !preg_match("/^(132\.170|68\.2[0-2]{1}\d{1}|24\.\d{2,3}|70\.243|204\.210|209\.86)\.\d{1,3}\.\d{1,3}$/", $_SERVER['REMOTE_ADDR'])): ?>
<div class="node_ad"><!--ad code here--></div>
<?php endif; ?>What does the preg_match do you ask? I don't think professors at school should see the ads on my site, since I do occasionally post assignments on my own site in addition to or instead of WebCT. The regex doesn't show the ads to the ucf.edu netblock or major local ISP address blocks.
