Understanding Web Statistics and Log Files


Starting with the most basic statistics‚ we will work through some of the statistics‚ ratios‚ charts and graphs you should keep. There is no one right way to keep these records. There are‚ however‚ a few guidelines that do apply to just about every situation.

  • Don’t measure for the sake of measuring. Tracking a specific statistic or trend should lead you to a goal. Set the goals for the site and track the numbers that will help you achieve your goals.
  • Don’t measure things that you have no control over. This is a real waste of time and will‚ as often as not‚ keep you focused on the wrong things.
  • As the goals of your site change the things you measure should also change. Just because you choose a measure now does not mean that you will keep it forever. Change is the only constant.
  • Build a base line. Before you start making massive changes build a base line of measures. This will insure that when you do start making changes you will have a frame of reference.
  • Don’t wait forever. You want to have a base line but you don’t want to wait for months to build your base line before you make any changes.
  • Building your site is an iterative process. After developing short and long term goals and defining your improvement cycle‚ expect change. Don’t be afraid of changing your methodology‚ your content‚ the entire site… if it makes sense.

Most of this can be boiled down to a single sentence. Set your goals‚ keep your measures and make changes as needed. Sounds simple but in reality fewer than one in a hundred commercial sites use any kind of process for tracking and measuring the success of their site or changes to the site. Many look at their site stats once a month. Very few use those numbers to make marketing decisions and fewer use them to determine the success of their marketing decisions.

Most of the numbers you use to measure the success of your site will come from your web site log files. Log files are generally text files that hold information about every request to your site. In web-speak‚ everything is a request. When some one enters your URL in their browser‚ it goes to your site and requests a page located at that address. The HTML that makes up the page makes other requests for images‚ sound files or other elements that make up the page. Each request is completely separate from all of the other requests. When you look at your log files you will see that each line in the log is a request form some asset on your site; pages and images. This is referred to as a stateless connection. In laymen terms‚ your web server does not know that the page requested was requested by the same person that made the request for an image on that page.

To maintain state web sites save little files to the browsers’ computer. These files are called cookies. Cookies are normally no more than a small file with a piece or two of information that allows the web program to know that the same person is requesting a number of different pages.

There are two types of cookies. Session cookies store a session ID that is checked by some web pages and lets the program know that it is the same person looking at the page. It contains no personal information and can only be read by the site that writes the cookie. Data cookies can contain personal data. Like session cookies‚ data cookies can only be read by the site that writes it. These cookies generally contain data that helps the web site program provide information that suits the user’s needs.

Back to log files. The following is a sample of an entire log entry from a real log file. The at (@) indicates the beginning and end of a log entry. There are tens of thousands of entries like this in your log files. It should be easy to see why just having access to your log files is useless without a log analysis program to sort through all of the information.

@ 2001-12-03 22:17:02 208.25.247.180 - W3SVC1 GET /_admin/ - 302 304 555 0 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT) newscorePartnerID=newscore;+newscoreZPI=newscore; +EZNewsUSERID=ZPI%3Dnewscore%26USERID%3Dwebtrans%26; +OptInEmailUserID=USERID%3D1013%26USEREMAIL%3Dbarryb% 40webtransitions%2Ecom%26;+WebTransSupportID=webtrans -@

The first piece of information you see is the date and time the request was made. (2001-12-03 22:17:02)

Next comes the requesting IP address. (208.25.247.180)

The next important piece of information is about the HTTP request (the item being requested) and the requesting browser. (GET /_admin/ - 302 304 555 0 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT))

In this example the last piece of information is info pulled from the site cookie. (newscorePartnerID=newscore;+newscoreZPI=newscore; +EZNewsUSERID=ZPI%3Dnewscore%26USERID%3Dwebtrans%26; +OptInEmailUserID=USERID%3D1013%26USEREMAIL%3Dbarryb% 40webtransitions%2Ecom%26;+WebTransSupportID=webtrans)

The following list gives you a few examples of log analysis programs with a URL for more information. Costs run from less than a hundred dollars to thousands.

  • Webtrends – www.webtrends.com
  • 123loganalyzer - www.123loganalyzer.co
  • webstat - www.webstat.com
  • faststats – www.mach5.com
  • sawmill - www.sawmill.net

Log analysis programs can be a little difficult to master. To measure the effect of your marketing program you will need to understand at least a little about the statistics. Trying to look at all of the information available is generally overkill. It is also a waste of time you could be spending on more productive activities (like marketing).

Most hosting companies will provide you with the statistics you need to do most of the basic tracking for your site. There will be some cases‚ however where the statistics offered by your hosting company will not do. For instance‚ most statistics programs will show you the top 10 or 20 pages with the most hits. If you are running an ad that is linked to a special page on your site‚ you will need to know the number of hits to that page regardless of where it falls in the rankings. If you need these types of statistics you will almost certainly need to upgrade the statistics reports you generally get from your hosting company.

The image below is a view of the basic statistics information you might see from your hosting company. The chart at the top gives you a graphic view of visitor sessions for the month of December‚ 2001. A visitor session is a visit to your site by one person. If the person looks at a hundred pages while they are on your site‚ it is still counted as a single visitor session.

The numbers under the chart indicate some of the basic statistics of the site through the month of December 2001.

  • Hit – a request for any file. If a page has ten images on it a request for that page will result in 11 hits. One for the page and one for each image or other asset on the page.
  • Page View – a request for a single page. This is counted as a single view regardless of how many images are on the page.
  • Visitor Session - A visitor session is a visit to your site by one person. If the person looks at a hundred pages while they are on your site‚ it is still counted as a single visitor session.
  • Visitor – A person that visits the site.
  • Unique Visitors – the number of individuals who visited the site. If a person comes more than once to the site they are still only counted as one unique visitor.

For many sites‚ this level of reporting is all you will need. However‚ having your own analysis program and access to your log files will allow you to look at your site statistics based on your specific needs. As an alternative‚ ask your hosting company if they can supply a more detailed analysis program. Let’s look at some of the more common statistics available through better web analysis programs

Your site statistics probably contains terms that you are not familiar with. Let’s take a look at a few and what they mean.

  • Hits: This is a pretty useless statistic. Unless otherwise defined a hit is any call to any resource on your web site. For instance if one of your web pages has 10 images on it and 10 buttons the log file will contain 21 hits for each time that page is loaded. One hit for each resource called. Web servers are stateless. That basically means that each request is an isolated incident. The server does not know that the request for each of the 10 images came from the same request. It only knows that the request was made and it sent the image.
  • Page Impressions or Page Hits: These are important hits. Only calls to web pages are recorded here.
  • Visitor Sessions: Another important statistic. A visitor session is one visitor to the site regardless of the number of pages he/she looked at.
  • Unique Visitors: This indicates the number of unique visitors to the site. You do need to remember that any visitor to the site is included in the Unique Visitor list. This includes search engine robots that index your site.
  • Visitors who visited more than once: This number indicates the number of visitors that have more than one Visitor Session. It means that the visitor came to your site‚ looked around and left. Then‚ at a later time (generally more than 20 minutes) came back to your site again.
  • Visitors who only visited once: Just as indicated‚ these visitors only visited your site once within the date ranges of your report. It does not mean that they did not visit you last month.
  • Most Requested Pages: This gives you a list of the most visited pages in your site. While this is not generally an indication of preference‚ it can tell you if a promotion you are running is working. For instance‚ if you add a page to your site on a specific topic and use a link to this site in your advertising‚ the number of hits to this page will help determine the success of your advertising.
  • Least Requested Pages: Again this is not necessarily an indication of preference; you should look at these pages. If some of your more important information is on these pages you may do well to look at making the links to the pages more obvious.
  • Top Entry Pages: Entry pages are the first page a person goes to at the beginning of a Visitor Session. These are important pages because they tell you what pages have been book-marked or where search engines are sending people. Keep an eye on these pages. If you have one page that is particular high‚ you may consider changing the content to include the information you want everyone to see.
  • Visitors by number of visits: This is a good indication of how many people are coming back to your site again and again. Generally you will want to increase the number of visits per person.
  • Top Visitors: While this sounds like a great bit of information it is generally not worth much. A lot of the visitors to your site will be cached. This is because most of the bigger ISP’s don’t send a surfers request straight to your server. Instead the requests are sent to a "cache" machine that makes the request‚ pulls the page down and stores it inside of the ISP’s network. This "cache" machine then serves the page to the surfer and other surfers from the same network.
  • Visitor by hour‚ day‚ week: While these statistics may be useful for very active sites‚ most small businesses will not have to worry too much about them. They may be interesting but will not help you much until you are getting hundreds of thousands of hits and are running your own servers.
  • Page not found errors: This list gives you a cross section of calls to your web server that did not go through. There are a great number of errors that don’t mean a lot except to the administrator of the web server. This is the one area where it is easy to spot certain types of hackers. The page errors you are interested in are errors calling specific pages and/or images on your site. All of these errors will begin with no slash or a forward slash and will include the names of files in your site. You may also see a number of errors with forward and back slashes and calls to the operating system root. You can disregard most of these unless you are responsible for the security of the web server.
  • Top Referring sites: This is a good list to look at. It will show you who is linking to your site and how many people are using the link. The most common referring sites are your own site and No Referrer. The rest will probably be search engines.
  • Top Search Engines: This list gets rid of all referrers except search engines and will give you a quick view of which search engines are doing the best for you.
  • Top Keywords: LOOK AT THIS LIST EVERY MONTH. Keywords give you a glimpse into what people are using to find your site. If you are actively looking for more traffic these keywords will help you determine what keywords to register with search engines or use in pay-per-click engines.
  • Visiting Spiders: This is a list of the spiders that have visited your site. Spiders are search engines that come to your site and follow each link‚ indexing the words on each page. If you are actively registering your site you will be able to tell when a specific search engine has spidered your site. This lets you know when to start checking to see how the registration worked.

Now that we know a little more about what statistics are available through the analysis of log files we can start to make some decisions about what measures we want to keep on our site. Some of these measurements can be taken straight from the analysis program. Total visitors or visitor sessions can be pulled straight from the stats program and plotted on a simple bar or line graph.

At this point most people ask why they need to graph the stats since they already have the numbers. After all‚ aren’t the numbers what we need to look at? The simple answer is NO. Generally the individual numbers mean much less than the numbers over time. That’s what a graph does for you. It displays numbers over time‚ giving you an easy to read graphical reference to the numbers. Sure‚ knowing that you had 10‚000 unique visitors is important. Knowing whether the number is increasing or decreasing over time is more important.

Back to Top