log file sample explained
The following is a fragment from the server logs for loganalyzer.net. All the relative URL's are for the base URL http://www.loganalyzer.net/.
First lets look at a fragment of log file....
22.214.171.124 - - [08/Oct/2007:04:54:20 -0400] "GET /support.html HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
126.96.36.199 - - [08/Oct/2007:11:17:55 -0400] "GET / HTTP/1.1" 200 10801 "http://www.google.com/search?q=log+analyzer&ie=utf-8&oe=utf-8 &aq=t&rls=org.mozilla:en-US:official&client=firefox-a" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:188.8.131.52) Gecko/20070914 Firefox/184.108.40.206"
220.127.116.11 - - [08/Oct/2007:11:17:55 -0400] "GET /style.css HTTP/1.1" 200 3225 "http://www.loganalyzer.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:18.104.22.168) Gecko/20070914 Firefox/22.214.171.124"
(Note, I've added some newline for clarity, and changed the IP number to 126.96.36.199 to protect the privacy of the actual visitor)
The fragment shown represents two visitors to our web site:
- A visit from 188.8.131.52 is google spider. It retrieved my pages and indexed them for their search engine.
- Someone from IP address 184.108.40.206 (changed to protect identity) who searched "log analyzer" phrase in google and looked at Nihuo Web Log Analyzer homepage.
A few things to note :
- Each line in the file represents a single "hit" on a file on the web server, and consists of a number of fields (explained below)
- A web page "hit" is a page view, not same as a web server "hit". For example, if a web page contains 5 images, a "hit" on that page will generate 6 "Hits" on the web server, one hit for the web page, 5 hits for the images.
- A unique visitor is determined by the IP address or cookie. By default, a visit session is terminated when a user falls on inactive state for more than 30 minutes. So a unique user may visit your web site twice and get reported as two visits.
If the visitor left the web site and came back 30 minutes later, Nihuo Web Log Analyzer will report 2 visits. If the visitor came back within 30 minutes, Nihuo Web Log Analyzer will still report 1 visit.
- The log file is in Apache/NCSA combined log format. The W3C maintains a standard format for web server log files, but other proprietary formats exist.
Different servers have different log formats. Nevertheless the data in this log fragment is pretty typical of the information available. Let's look at one line from the above fragment (split for easier viewing).
"GET / HTTP/1.1"
"Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:220.127.116.11) Gecko/20070914 Firefox/18.104.22.168"
IP address : "22.214.171.124"
This is the IP address of the machine that contacted our site. Nihuo Web Log Analyzer can tell you from which city the user is visiting by checking its built-in IP-to-city database.
Remote log name: "-"
This will return a dash unless IdentityCheck is set on your web server.
Authenticated user name : "-"
Only available when accessing content which is password protected by web server authenticate system.
Timestamp : [08/Oct/2007:11:17:55 -0400]
Time stamp of the visit as seen by the web server. -0400 is time zone designator of your web server.
Access request : "GET / HTTP/1.1"
The request made. In this case it was a "GET" request (i.e. "show me the page") for the file "/" (homepage) using the "HTTP/1.1" protocol.
Detail information about HTTP protocol is available in http://en.wikipedia.org/wiki/HTTP.
Result status code : "200"
The resulting status code. "200" is success. This tells you whether the request was successful or not. Nihuo Web Log Analyzer uses this information to tell you about any errors visitors saw (e.g. HTTP 404 "File Not Found" or HTTP 500 "Internal Server Error").
For a list of possible codes, visit http://en.wikipedia.org/wiki/List_of_HTTP_status_codes.
Bytes transferred : "10801"
The number of bytes transferred. This tells you how many bytes were transferred to the user, i.e. the bandwidth used. In this case the home page file is 10801 bytes, or about 10K. By adding up all of this information, Nihuo Web Log Analzyer can tell you the total bandwidth used by your site, and it can also tell you the total used for each file and each visitor.
Referrer URL : "http://www.google.com/search?q=log+analyzer&ie=utf-8&oe=utf- 8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a"
The referring url. Not all user agents (see below) supply this information. This is the page the visitor was on when they clicked to come to this page. Usually this will mean that this page has a link to yours, but sometimes this is simply the page the user was looking at when they typed in your address into their browser, or clicked on your address in some other software such as a news reader or an email client.
In this example, the referrer is a Google results page for the query "log analyzer ".
Depending on the browser used, visitors may be able to "withhold" this information, although doing so just makes life a little harder for webmasters to optimize their sites. Where the referrer is withheld it appears in the log as "-".
User Agent : "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:126.96.36.199) Gecko/20070914 Firefox/188.8.131.52"
The "User Agent" identifier. The User Agent is whatever software the visitor used to access this site. It's usually a browser, but it could equally be a web robot, a link checker, an FTP client or an offline browser.
The "user agent" string is set by the software manufacturer, and can be anything they choose to be. As such it can't be relied upon, although most reputable software writers will use a string that helps identify the client.
In this case "Mozilla/5.0" probably means visitor's browser is mozilla compatible,"Windows NT 5.2" indicates Windows 2003, "en-US" probably implies it's an English version, "Firefox/184.108.40.206" means Firefox 2.0. In the first line, "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" means this hit is caused by google bot(spider).
Some agents allow you to withhold this identifier, some let you set it yourself, other will actually "fake" this to look like something else. Where the agent is withheld it appears in the log as "-".
For list of user agents and web spiders, visit http://www.user-agents.org/..
Back To Log Analysis Tutorial