Understanding Web Statistics
and Log Files
Starting with
the most basic statistics‚ we will work through some of the
statistics‚ ratios‚ charts and graphs you should keep. There is no
one right way to keep these records. There are‚ however‚ a few
guidelines that do apply to just about every situation.
-
Don’t measure for the sake of measuring. Tracking a
specific statistic or trend should lead you to a goal. Set the goals
for the site and track the numbers that will help you achieve your
goals.
-
Don’t measure things that you have no control over. This
is a real waste of time and will‚ as often as not‚ keep you focused on
the wrong things.
-
As the goals of your site change the things you measure
should also change. Just because you choose a measure now does not
mean that you will keep it forever. Change is the only constant.
-
Build a base line. Before you start making massive
changes build a base line of measures. This will insure that when you
do start making changes you will have a frame of reference.
-
Don’t wait forever. You want to have a base line but you
don’t want to wait for months to build your base line before you make
any changes.
-
Building your site is an iterative process. After
developing short and long term goals and defining your improvement
cycle‚ expect change. Don’t be afraid of changing your methodology‚
your content‚ the entire site… if it makes sense.
Most of this can be
boiled down to a single sentence. Set your goals‚ keep your
measures and make changes as needed. Sounds simple but in
reality fewer than one in a hundred commercial sites use any kind of
process for tracking and measuring the success of their site or changes
to the site. Many look at their site stats once a month. Very few use
those numbers to make marketing decisions and fewer use them to
determine the success of their marketing decisions.
Most of the numbers
you use to measure the success of your site will come from your web site
log files. Log files are generally text files that hold information
about every request to your site. In web-speak‚ everything is a request.
When some one enters your URL in their browser‚ it goes to your site and
requests a page located at that address. The HTML that makes up
the page makes other requests for images‚ sound files or other elements
that make up the page. Each request is completely separate from all of
the other requests. When you look at your log files you will see that
each line in the log is a request form some asset on your site; pages
and images. This is referred to as a stateless connection. In
laymen terms‚ your web server does not know that the page requested was
requested by the same person that made the request for an image on that
page.
To maintain state
web sites save little files to the browsers’ computer. These files are
called cookies. Cookies are normally no more than a small file with a
piece or two of information that allows the web program to know that the
same person is requesting a number of different pages.
There are two types
of cookies. Session cookies store a session ID that is checked by
some web pages and lets the program know that it is the same person
looking at the page. It contains no personal information and can only be
read by the site that writes the cookie. Data cookies can contain
personal data. Like session cookies‚ data cookies can only be read by
the site that writes it. These cookies generally contain data that helps
the web site program provide information that suits the user’s needs.
Back to log files.
The following is a sample of an entire log entry from a real log file.
The at (@) indicates the beginning and end of a log entry. There are
tens of thousands of entries like this in your log files. It should be
easy to see why just having access to your log files is useless without
a log analysis program to sort through all of the information.
@ 2001-12-03
22:17:02 208.25.247.180 - W3SVC1 GET /_admin/ - 302 304 555 0 HTTP/1.1
Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT) newscorePartnerID=newscore;+newscoreZPI=newscore;
+EZNewsUSERID=ZPI%3Dnewscore%26USERID%3Dwebtrans%26; +OptInEmailUserID=USERID%3D1013%26USEREMAIL%3Dbarryb%
40webtransitions%2Ecom%26;+WebTransSupportID=webtrans -@
The first piece of
information you see is the date and time the request was made.
(2001-12-03 22:17:02)
Next comes the
requesting IP address. (208.25.247.180)
The next important
piece of information is about the HTTP request (the item being
requested) and the requesting browser. (GET /_admin/ - 302 304 555 0
HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT))
In this example the
last piece of information is info pulled from the site cookie. (newscorePartnerID=newscore;+newscoreZPI=newscore;
+EZNewsUSERID=ZPI%3Dnewscore%26USERID%3Dwebtrans%26; +OptInEmailUserID=USERID%3D1013%26USEREMAIL%3Dbarryb%
40webtransitions%2Ecom%26;+WebTransSupportID=webtrans)
The following list
gives you a few examples of log analysis programs with a URL for more
information. Costs run from less than a hundred dollars to thousands.
-
Webtrends –
www.webtrends.com
-
123loganalyzer -
www.123loganalyzer.co
-
webstat - www.webstat.com
-
faststats – www.mach5.com
-
sawmill - www.sawmill.net
Log analysis
programs can be a little difficult to master. To measure the effect of
your marketing program you will need to understand at least a little
about the statistics. Trying to look at all of the information available
is generally overkill. It is also a waste of time you could be spending
on more productive activities (like marketing).
Most hosting
companies will provide you with the statistics you need to do most of
the basic tracking for your site. There will be some cases‚ however
where the statistics offered by your hosting company will not do. For
instance‚ most statistics programs will show you the top 10 or 20 pages
with the most hits. If you are running an ad that is linked to a special
page on your site‚ you will need to know the number of hits to that page
regardless of where it falls in the rankings. If you need these types of
statistics you will almost certainly need to upgrade the statistics
reports you generally get from your hosting company.
The image below is a
view of the basic statistics information you might see from your hosting
company. The chart at the top gives you a graphic view of visitor
sessions for the month of December‚ 2001. A visitor session is a visit
to your site by one person. If the person looks at a hundred pages while
they are on your site‚ it is still counted as a single visitor session.
The numbers under
the chart indicate some of the basic statistics of the site through the
month of December 2001.
-
Hit – a request for any file. If a page has ten images
on it a request for that page will result in 11 hits. One for the page
and one for each image or other asset on the page.
-
Page View – a request for a single page. This is counted
as a single view regardless of how many images are on the page.
-
Visitor Session - A visitor session is a visit to your
site by one person. If the person looks at a hundred pages while they
are on your site‚ it is still counted as a single visitor session.
-
Visitor – A person that visits the site.
-
Unique Visitors – the number of individuals who visited
the site. If a person comes more than once to the site they are still
only counted as one unique visitor.

For many sites‚ this
level of reporting is all you will need. However‚ having your own
analysis program and access to your log files will allow you to look at
your site statistics based on your specific needs. As an alternative‚
ask your hosting company if they can supply a more detailed analysis
program. Let’s look at some of the more common statistics available
through better web analysis programs
Your site statistics
probably contains terms that you are not familiar with. Let’s take a
look at a few and what they mean.
-
Hits: This is a pretty useless statistic. Unless otherwise
defined a hit is any call to any resource on your web site. For
instance if one of your web pages has 10 images on it and 10 buttons
the log file will contain 21 hits for each time that page is loaded.
One hit for each resource called. Web servers are stateless. That
basically means that each request is an isolated incident. The server
does not know that the request for each of the 10 images came from the
same request. It only knows that the request was made and it sent the
image.
-
Page Impressions or Page Hits:
These are important hits. Only calls to web pages are recorded here.
-
Visitor Sessions:
Another important statistic. A visitor session is one visitor to the
site regardless of the number of pages he/she looked at.
-
Unique Visitors:
This indicates the number of unique visitors to the site. You do need
to remember that any visitor to the site is included in the Unique
Visitor list. This includes search engine robots that index your site.
-
Visitors who visited more than once:
This number indicates the number of visitors that have more than one
Visitor Session. It means that the visitor came to your site‚ looked
around and left. Then‚ at a later time (generally more than 20
minutes) came back to your site again.
-
Visitors who only visited once:
Just as indicated‚ these visitors only visited your site once within
the date ranges of your report. It does not mean that they did not
visit you last month.
-
Most Requested Pages:
This gives you a list of the most visited pages in your site. While
this is not generally an indication of preference‚ it can tell you if
a promotion you are running is working. For instance‚ if you add a
page to your site on a specific topic and use a link to this site in
your advertising‚ the number of hits to this page will help determine
the success of your advertising.
-
Least Requested Pages:
Again this is not necessarily an indication of preference; you should
look at these pages. If some of your more important information is on
these pages you may do well to look at making the links to the pages
more obvious.
-
Top Entry Pages:
Entry pages are the first page a person goes to at the beginning of a
Visitor Session. These are important pages because they tell you what
pages have been book-marked or where search engines are sending
people. Keep an eye on these pages. If you have one page that is
particular high‚ you may consider changing the content to include the
information you want everyone to see.
-
Visitors by number of visits:
This is a good indication of how many people are coming back to your
site again and again. Generally you will want to increase the number
of visits per person.
-
Top Visitors: While this sounds like a great bit of
information it is generally not worth much. A lot of the visitors to
your site will be cached. This is because most of the bigger ISP’s
don’t send a surfers request straight to your server. Instead the
requests are sent to a "cache" machine that makes the request‚ pulls
the page down and stores it inside of the ISP’s network. This "cache"
machine then serves the page to the surfer and other surfers from the
same network.
-
Visitor by hour‚ day‚ week:
While these statistics may be useful for very active sites‚ most small
businesses will not have to worry too much about them. They may be
interesting but will not help you much until you are getting hundreds
of thousands of hits and are running your own servers.
-
Page not found errors:
This list gives you a cross section of calls to your web server that
did not go through. There are a great number of errors that don’t mean
a lot except to the administrator of the web server. This is the one
area where it is easy to spot certain types of hackers. The page
errors you are interested in are errors calling specific pages and/or
images on your site. All of these errors will begin with no slash or a
forward slash and will include the names of files in your site. You
may also see a number of errors with forward and back slashes and
calls to the operating system root. You can disregard most of these
unless you are responsible for the security of the web server.
-
Top Referring sites:
This is a good list to look at. It will show you who is linking to
your site and how many people are using the link. The most common
referring sites are your own site and No Referrer. The rest
will probably be search engines.
-
Top Search Engines:
This list gets rid of all referrers except search engines and will
give you a quick view of which search engines are doing the best for
you.
-
Top Keywords: LOOK AT THIS LIST EVERY MONTH. Keywords give
you a glimpse into what people are using to find your site. If you are
actively looking for more traffic these keywords will help you
determine what keywords to register with search engines or use in
pay-per-click engines.
-
Visiting Spiders:
This is a list of the spiders that have visited your site. Spiders are
search engines that come to your site and follow each link‚ indexing
the words on each page. If you are actively registering your site you
will be able to tell when a specific search engine has spidered your
site. This lets you know when to start checking to see how the
registration worked.
Now that we know a
little more about what statistics are available through the analysis of
log files we can start to make some decisions about what measures we
want to keep on our site. Some of these measurements can be taken
straight from the analysis program. Total visitors or visitor sessions
can be pulled straight from the stats program and plotted on a simple
bar or line graph.
At this point
most people ask why they need to graph the stats since they already have
the numbers. After all‚ aren’t the numbers what we need to look at? The
simple answer is NO. Generally the individual numbers mean much less
than the numbers over time. That’s what a graph does for you. It
displays numbers over time‚ giving you an easy to read graphical
reference to the numbers. Sure‚ knowing that you had 10‚000 unique
visitors is important. Knowing whether the number is increasing or
decreasing over time is more important.
Back to Top |