- What are the first and last dates in the logfile?
First date:awk -F '\[' '{print $2}' access.log | awk -F '\]' '{print $1}' | head -1
Result 25/Sep/2005:06:32:26 +0200
Last Date:awk -F '\[' '{print $2}' access.log | awk -F '\]' '{print $1}' | tail -1
Result 02/Oct/2005:06:27:54 +0200
- How many ’hits’ did the server receive during this period?
Answer: wc -l access.log
Another (tricky): awk -F '\$' '{print $200}' access.log | uniq -c | awk '{print $1}'
Result: 27922 access.log - How many ’pages’ were accessed?
Answer: cut -d \“ -f 2 access.log | sort | uniq | wc -l
Result: 6055
Explain: use ” as delimiter and get the second field which is the “get” command, sort it, then get the unique lines, then count the lines to get unique pages. One thing quite strange is that the command should work with uniq -c, but it will not. The result will only be printed partially and the rest will be 0x40 and 0x90 code. So I used wc here to count the total line. - How many bytes in total did the webserver serve? (in MB or GB)
Answer: cut -d \ -f 10 access.log | sort -n > mynumber
Result: 311844206Bytes=297.40GB
Explain: The cut command get the size of the file in Bytes, and put the result into mynumber, the script file will then read this file line by line and sum it up.#!/bin/sh i=0 linecount=0 while read f do linecount=`expr "$linecount" + "1"` echo $f | grep "[^0-9]" > /dev/null 2>&1 if [ "$?" != "0" ]; then i=`expr "$i" + "$f"` fi echo $linecount: $i done < mynum
- Ascertain the 10 topscores for users.
- provide the number of hits with the percentage it represents.
Answer: awk '{ print $1}' access.log | sort | uniq -c | awk '{print $1,$1*100/27922,$2 }' | sort -n -r | head -10
Result:
2872 10.2858 crawl-66-249-65-79.googlebot.com
1008 3.61006 gnowee.ic.uva.nl
452 1.6188 adsl-200-25.dsl.uva.nl
246 0.881026 64.124.85.72.become.com
190 0.680467 d83-176-53-240.cust.tele2.ch
157 0.562281 ip51cf5a32.direct-adsl.nl
103 0.368885 n219077243196.netvigator.com
103 0.368885 host213-106-248-101.no-dns-yet.ntli.net
100 0.358141 renf-cache-9.server.ntli.net
100 0.358141 cnr07-73.mdacc.tmc.edu
Explain: get first field, hostname, from access.log then sort it to be able to count the unique records, then print first field, first field divide by the total hits (percentage), and field three, sort them by the first field numerically and reversely, then we get the list of top 10 users with percentage of total hits. - also translate all hostnames to ip-addresses
Answer: awk '{ print $1}' access.log | sort | uniq -c | awk '{print $1,$2 }' | sort -n -r | head -10 | awk '{print $1, $1*100/27922, $2, system(“nslookup “$2)}'
- ascertain the 10 topscores of referrers.
Answer: awk '{ print $11}' access.log | sort | uniq -c | awk '{print $1,$1*100/27922,$2 }' | sort -n -r | head -10
Result:
6710 24.0312 ”-” 2940 10.5293 “http://www.mintha.com/current/tenerife.html”
2176 7.79314 “http://www.mintha.com/current/paris.html”
1921 6.87988 “http://www.mintha.com/1999-07/beachboat.html”
828 2.9654 “http://www.mintha.com/current/ireland.html”
828 2.9654 “http://www.mintha.com/1999-10/prague.html”
607 2.17391 “http://www.mintha.com/current/romania.html”
501 1.79428 “http://www.mintha.com/current/camper2.html”
500 1.7907 “http://www.mintha.com/2000-09/biking2.html”
435 1.55791 “http://www.mintha.com/current/camper1.html”
Explain: same as last question, just change the field number to 11. This one did not filter out the internal referer.- also try to filter out the internal referrers, so you end up with a list of external referrers only.
Answer: awk '{ print $11}' access.log | grep -v “mintha.com” | sort | uniq -c | awk '{print $1,$1*100/27922,$2 }' | sort -n -r | head -10
Result:
6710 24.0312 “-”
302 1.08158 “http://p086.ezboard.com/falmshousefrm7”
48 0.171907 “http://pub208.ezboard.com/falmshousefrm7”
38 0.136093 “http://p086.ezboard.com/falmshousefrm7.showMessage?topicID=11269.topic”
31 0.111024 “http://www.ultralinux.org/”
24 0.0859537 “http://partyflock.nl/topic/719231/PAGE/149.html”
17 0.0608839 “http://www.dreamcommunity.nl/index.php?id=109&account=rEhWFvq”
16 0.0573025 “http://birdgirl13.blogmaker.com/”
12 0.0429769 “http://p086.ezboard.com/falmshousefrm7.showMessage?topicID=11261.topic”
11 0.0393955 “http://www.altavista.com/image/randomlink”
- create a list of the browsers used
- How can you filter the string reliably
Answer: awk -F '\“' '{print $6}' access.log | sort | uniq -c | sort -n -r | head -10
Result:
4577 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
4260 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
2719 Googlebot-Image/1.0
1472 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
1129 - 1114 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
843 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
800 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
291 Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
266 Mozilla/5.0 (compatible; BecomeBot/2.3; MSIE 6.0 compatible; +http://www.become.com/site_owners.html) - Ascertain the top X browsers used.
Answer: awk -F '\”' '{print $6}' access.log | grep -E [0-9a-zA-Z]*[oO][sS][\ ][xX][0-9a-zA-Z]* | sort | uniq -c | sort -n -r | head -10
Result:
185 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412.7 (KHTML, like Gecko) Safari/412.5
94 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6
78 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2
58 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/312.5 (KHTML, like Gecko) Safari/312.3
52 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/312.1 (KHTML, like Gecko) Safari/312
41 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/312.5 (KHTML, like Gecko) Safari/312.3
38 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/124 (KHTML, like Gecko) Safari/125.1
32 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/312.5.1 (KHTML, like Gecko) Safari/312.3.1
29 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4
23 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; es-ES; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 - What OS do the users use (in percentages)
Answer:
awk -F '\"' '{print $6}' access.log | awk -F ';' '{print $3}' | sed s/\)//g | sort | uniq -c | sort -n -r | head -10 | awk '{print $1/29722, $0}'
Result:
0.564128 16767 Windows NT 5.1
0.151874 4514
0.0893278 2655 Windows NT 5.0
0.0403068 1198 Windows 98
0.0181347 539 PPC Mac OS X
0.0143665 427 AOL 9.0
0.0089496 266 MSIE 6.0 compatible
0.00847857 252 +http://www.google.com/bot.html
0.00669538 199 Mac_PowerPC
0.00568602 169 Windows NT 5.2
- Think of a fun search/statistic of your own and provide the answer.
Answer: awk -F '\"' '{print $6}' access.log | awk -F ';' '{print $3}' | sed s/\)//g | grep -E [0-9a-zA-A]*\([mM][aA][cC]\)+[0-9a-zA-Z]* | sort | uniq -c | sort -n -r | head -10
Result:
539 PPC Mac OS X
199 Mac_PowerPC
165 PPC Mac OS X Mach-O
Explain: list mac users