[How To] Set up 404 tracking on web pages

Herbalist Dr MziziMkavu

JF-Expert Member
Feb 3, 2009
42,299
33,080
Description:
How to set up 404 error tracking with PHP and Apache Server.
***********
If you've noticed an abnormal number of 404 errors being hit on your site, there is probably something wrong with it, maybe an outdated link or missing files. Diagnosing the problem isn't easy when you have to go through huge log files looking for one or two lines. That's why there is an easy way to track what URL people browse to, to get the 404 error. This tracking requires Apache server and PHP support. Let's start:



  • Open a text editor, any editor will work and enter this as the page content:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
<title>404 Page not Found</title>
</head>
<body>
<font size="2" face="Verdana">
<b>404 Page Not Found</b><br><br>

The page you requested does not exist. Please use your browsers back button to go back to your previous page or click the Refresh button on your browser and try refreshing the page. If the problem persists please go back to the home page
</font>
<?php
$url = $_SERVER[ "REQUEST_URI" ];
$referrer = $_SERVER[ "HTTP_REFERER" ];
if ( $referrer == "" )
$referrer = "Unknown";
if ($url!= "/sitemap.rdf"
&& stristr($url, '/_vti_bin/') == FALSE
&& stristr($url, '/siteinfo.xml') == FALSE
&& stristr($url, '/MSOffice/clt') == FALSE
){
mail("my@emailaddress", "Page Not Found",
"Requested Page: " . $url
. "\r\nReferred By: " . $referrer
. "\r\nRemote Addr: " . $_SERVER["REMOTE_ADDR"] . " (" . $_SERVER["REMOTE_HOST"] . ")"
. "\r\n"
. "Cookies: \r\n"
. implode(",", $_COOKIE)
. "\r\nRequest URI: " . $_SERVER["REQUEST_URI"]
. "\r\n"
,"From: my@emailaddress" );
}
?>
</body>
</html>



  • There are various aspects that you can customize:
  • First of all, you can customize the details like the font, and the 404 message by editing a bit of HTML
  • Another thing you have to do is change the email address in the PHP script from "my@emailaddress" to your email address.
  • You also can change the email subject title from "Page Not Found" to whatever you like
  • You may consider removing the ". Cookies: \r\n" . implode(",", $_COOKIE)" part of the script to protect visitors' privacy
  • Save that as 404.php, a PHP file, and move on to .htaccess
  • Open a new document in a text editor and enter this line:

ErrorDocument 404 /404.php


  • Save the file as .htaccess. If your editor puts a .txt extenion on the file remove the .txt extension and make sure it is an HTACCESS file
  • Upload both 404.php and .htaccess to the root of your web space
  • Now, test the 404 tracker by going to eCommerce Hosting, Web Hosting, eMail Hosting, from Yoursite.Com.
  • The 404 page should appear with the text on it
  • Now go to your email account that you put in the PHP file and you should see a message with the subject "Page Not Found" (the subject might be different if you put a custom subject in the PHP file)
  • The contents of the email should be something like this:
Requested Page: /something-random-like-this
Referred By: Unknown
Remote Addr: 65.78.464.43 ()
Cookies:
Request URI: /something-random-like-this


  • The Requested Page/Requested URI shows what address the person went to to get the 404 error
  • The Referred By field shows the referrer to the page, if it is known
  • The Remote Addr. shows the IP address of the person who got th 404
  • The Cookies field (although empty in this example) shows what cookies the viewer had stored. As mentioned in the editing instructions it may be wise to remove the Cookies part from the PHP script because it generally shows a lot of sensitive data about the user, including saved passwords, etc.

This should help you track down 404 HUMAN errors, but sometimes and more than likely, bots are causing the abnormal amount of 404 hits and these are generally pretty hard to solve, you'll have to do a lot of research and look through logs to figure out the problem if that is the case.
 
Back
Top Bottom