Let's say you have a portal page that you want to pull some dynamic content into. Let's say an RSS feed with headlines. Problem is, the address of that page is already established, and it's got so much traffic you don't really want it to be dynamic. Pulling in RSS feeds can potentially slow down dynamic page rendering significantly, especially if you use several feeds from several sites.
So what to do?
One way to do it is to use wget and cron
wget -O /path/to/the/resulting/file.htm http://www.mysite.com/script.php
That script could go something like this:
<html>
<?php
include("static1.php");
include("rssfeed.php");
include("static2.php");
?>
</html>
Then you need to remember to split up the page to create the fragments you're using. There could be several more.
But then some webhosts don't allow wget or lynx. So what do you do?
RichardP, the guy who runs WikiMinion (a robot that removes spam from wikis), created a script for me. It's in Perl, and needs to be chmod'ded to include execute in order to work. Find the script file here. Remember to rename it to fetchurl.pl
#!/usr/bin/perl
# ===========================================================================
# fetchurl.pl - a very simple replacement for wget -O <file> <url>
# Usage: fetchurl.pl <file> <url>
# ===========================================================================
package main;
use 5.006_000;
use diagnostics;
use strict;
use warnings;
# ===========================================================================
# Modules
# ===========================================================================
use HTTP::Request;
use LWP::UserAgent;
# ===========================================================================
# Subroutines
# ===========================================================================
sub FetchFile($$)
{
my ($src_url, $dst_file) = @_;
my $error = '';
my $ua = LWP::UserAgent->new;
my $request = HTTP::Request->new(GET => $src_url);
# issue GET request
my $response = $ua->request($request);
if ($response->is_success)
{
# write HTML content to file
if (open(FH_OUT, "> $dst_file"))
{
print FH_OUT $response->content;
close(FH_OUT);
}
else
{
$error = "can't open datafile '$dst_file' ($!)";
}
}
else
{
$error = "attempt to retrieve '$src_url' failed (" . $response->status_line . ")";
}
if ($error ne '')
{
print STDERR "Error: $error\n";
}
return ($error eq '');
}
sub DoUsage()
{
print STDERR "Usage: fetchurl <file> <url>\n";
print STDERR " <file> file to create or overwrite\n";
print STDERR " <url> URL to fetch.\n";
}
sub Main()
{
my @arguments = @ARGV;
# exactly two arguments must be supplied (<file> and <url>)
if (scalar(@arguments) != 2)
{
DoUsage();
exit 0;
}
my $dst_file = $arguments[0];
my $src_url = $arguments[1];
# check if the source URL argument looks like a URL
if (not ($src_url =~ /^https?:\/\/[\w\-.]+(\/\S+)?$/))
{
print STDERR "Error: fetchurl doesn't recognize '$src_url' as a URL.\n";
DoUsage();
exit 0;
}
# fetch the URL and write it to the file
my $success = FetchFile($src_url, $dst_file);
return $success;
}
# ===========================================================================
# Call Main
# ===========================================================================
Main();
Run it from cron this way:
/path/fetchurl.pl path/to/the/resulting/file.htm http://www.mysite.com/script.php
And you still use the same php script to build the page and do the includes.
It works for me, and hopefully it'll solve problems for others whose hosts remove wget without warning (hate it when that happens...). Or simply for someone looking for a way to build a single static page from dynamic content - let's say every hour or every 30 minutes.
This page was created by Ann Elisabeth Nordbo
and has its home at http://www.annelisabeth.com/
Updated 11.02.2005
Premiere issue October 2005