1. What are the first and last dates in the logfile?
    First date:
    awk -F '\[' '{print $2}' access.log | awk -F '\]' '{print $1}' | head -1

    Result 25/Sep/2005:06:32:26 +0200
    Last Date:

    awk -F '\[' '{print $2}' access.log | awk -F '\]' '{print $1}' | tail -1

    Result 02/Oct/2005:06:27:54 +0200

  2. How many ’hits’ did the server receive during this period?
    Answer: wc -l access.log
    Another (tricky): awk -F '\$' '{print $200}' access.log | uniq -c | awk '{print $1}'
    Result: 27922 access.log
  3. How many ’pages’ were accessed?
    Answer: cut -d \“ -f 2 access.log | sort | uniq | wc -l
    Result: 6055
    Explain: use ” as delimiter and get the second field which is the “get” command, sort it, then get the unique lines, then count the lines to get unique pages. One thing quite strange is that the command should work with uniq -c, but it will not. The result will only be printed partially and the rest will be 0x40 and 0x90 code. So I used wc here to count the total line.
  4. How many bytes in total did the webserver serve? (in MB or GB)
    Answer: cut -d \ -f 10 access.log | sort -n > mynumber
    Result: 311844206Bytes=297.40GB
    Explain: The cut command get the size of the file in Bytes, and put the result into mynumber, the script file will then read this file line by line and sum it up.
    #!/bin/sh
    i=0
    linecount=0
    while read f
    do
      linecount=`expr "$linecount" + "1"`
      echo $f | grep "[^0-9]" > /dev/null 2>&1
      if [ "$?" != "0" ]; then
        i=`expr "$i" + "$f"`
      fi
    echo $linecount: $i
    done < mynum
  5. Ascertain the 10 topscores for users.
    • provide the number of hits with the percentage it represents.
      Answer: awk '{ print $1}' access.log | sort | uniq -c | awk '{print $1,$1*100/27922,$2 }' | sort -n -r | head -10
      Result:
      2872 10.2858 crawl-66-249-65-79.googlebot.com
      1008 3.61006 gnowee.ic.uva.nl
      452 1.6188 adsl-200-25.dsl.uva.nl
      246 0.881026 64.124.85.72.become.com
      190 0.680467 d83-176-53-240.cust.tele2.ch
      157 0.562281 ip51cf5a32.direct-adsl.nl
      103 0.368885 n219077243196.netvigator.com
      103 0.368885 host213-106-248-101.no-dns-yet.ntli.net
      100 0.358141 renf-cache-9.server.ntli.net
      100 0.358141 cnr07-73.mdacc.tmc.edu
      Explain: get first field, hostname, from access.log then sort it to be able to count the unique records, then print first field, first field divide by the total hits (percentage), and field three, sort them by the first field numerically and reversely, then we get the list of top 10 users with percentage of total hits.
    • also translate all hostnames to ip-addresses
      Answer: awk '{ print $1}' access.log | sort | uniq -c | awk '{print $1,$2 }' | sort -n -r | head -10 | awk '{print $1, $1*100/27922, $2, system(“nslookup “$2)}'
  6. ascertain the 10 topscores of referrers.
    Answer: awk '{ print $11}' access.log | sort | uniq -c | awk '{print $1,$1*100/27922,$2 }' | sort -n -r | head -10
    Result:
    6710 24.0312 ”-” 2940 10.5293 “http://www.mintha.com/current/tenerife.html
    2176 7.79314 “http://www.mintha.com/current/paris.html
    1921 6.87988 “http://www.mintha.com/1999-07/beachboat.html
    828 2.9654 “http://www.mintha.com/current/ireland.html
    828 2.9654 “http://www.mintha.com/1999-10/prague.html
    607 2.17391 “http://www.mintha.com/current/romania.html
    501 1.79428 “http://www.mintha.com/current/camper2.html
    500 1.7907 “http://www.mintha.com/2000-09/biking2.html
    435 1.55791 “http://www.mintha.com/current/camper1.html
    Explain: same as last question, just change the field number to 11. This one did not filter out the internal referer.
  7. create a list of the browsers used
    • How can you filter the string reliably
      Answer: awk -F '\“' '{print $6}' access.log | sort | uniq -c | sort -n -r | head -10
      Result:
      4577 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
      4260 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
      2719 Googlebot-Image/1.0
      1472 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
      1129 - 1114 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
      843 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
      800 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
      291 Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
      266 Mozilla/5.0 (compatible; BecomeBot/2.3; MSIE 6.0 compatible; +http://www.become.com/site_owners.html)
    • Ascertain the top X browsers used.
      Answer: awk -F '\”' '{print $6}' access.log | grep -E [0-9a-zA-Z]*[oO][sS][\ ][xX][0-9a-zA-Z]* | sort | uniq -c | sort -n -r | head -10
      Result:
      185 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412.7 (KHTML, like Gecko) Safari/412.5
      94 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6
      78 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2
      58 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/312.5 (KHTML, like Gecko) Safari/312.3
      52 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/312.1 (KHTML, like Gecko) Safari/312
      41 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/312.5 (KHTML, like Gecko) Safari/312.3
      38 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/124 (KHTML, like Gecko) Safari/125.1
      32 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/312.5.1 (KHTML, like Gecko) Safari/312.3.1
      29 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4
      23 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; es-ES; rv:1.7.8) Gecko/20050511 Firefox/1.0.4
    • What OS do the users use (in percentages)

Answer:

awk -F '\"' '{print $6}' access.log | awk -F ';' '{print $3}' | sed s/\)//g | sort | uniq -c | sort -n -r | head -10 | awk '{print $1/29722, $0}'

Result:
0.564128 16767 Windows NT 5.1
0.151874 4514
0.0893278 2655 Windows NT 5.0
0.0403068 1198 Windows 98
0.0181347 539 PPC Mac OS X
0.0143665 427 AOL 9.0
0.0089496 266 MSIE 6.0 compatible
0.00847857 252 +http://www.google.com/bot.html
0.00669538 199 Mac_PowerPC
0.00568602 169 Windows NT 5.2

  • Think of a fun search/statistic of your own and provide the answer.
Answer: awk -F '\"' '{print $6}' access.log | awk -F ';' '{print $3}' | sed s/\)//g | grep -E [0-9a-zA-A]*\([mM][aA][cC]\)+[0-9a-zA-Z]* | sort | uniq -c | sort -n -r | head -10 

Result:
539 PPC Mac OS X
199 Mac_PowerPC
165 PPC Mac OS X Mach-O
Explain: list mac users