Deuzebranaweb suporte Host
Apache2

Parsing Apache Logs

cat access.log | awk ‘{print $1 ” ” $4}’

Below are some simple awk/sed/etc command line scripts to parse apache logs and get quick statistics

Unique visitors per day

Where access.log is your combined log file with typical format as below:

access.log
69.175.xxx.yyy – – [13/Jul/2013:06:28:31 -0500] “GET /some/web/folder/somewebpage2 HTTP/1.0” 404 4212 “http://somesubdomain.example.org/some/web/folder/some_web_page1” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1”
69.175.xxx.yyy – – [13/Jul/2013:06:28:35 -0500] “GET /some/web/folder/?do=register HTTP/1.0” 302 599 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1”
69.175.xxx.yyy – – [13/Jul/2013:06:28:36 -0500] “GET /some/web/folder/somewebpage2 HTTP/1.0” 404 4212 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1”
69.175.xxx.yyy – – [13/Jul/2013:06:28:41 -0500] “POST /some/web/folder/somewebpage2 HTTP/1.0” 200 2439 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1”

94.123.xxx.yyy – – [13/Jul/2013:06:32:49 -0500] “GET /some/web/folder/some_web_page1 HTTP/1.1” 200 5121 “http://somesubdomain.example.org/” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:49 -0500] “GET /some/web/folder/?do=login HTTP/1.1” 302 599 “http://somesubdomain.example.org/some/web/folder/some_web_page1” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:50 -0500] “GET /some/web/folder/somewebpage2 HTTP/1.1” 404 4214 “http://somesubdomain.example.org/some/web/folder/some_web_page1” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:50 -0500] “GET /some/web/folder/?do=register HTTP/1.1” 302 599 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:50 -0500] “GET /some/web/folder/somewebpage2 HTTP/1.1” 404 4214 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:51 -0500] “POST /some/web/folder/somewebpage2 HTTP/1.1” 200 2530 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0″
Below is the command line script. This gets the unique hits per day. There is a grep at the very end to do a final filter for the Month and Year you may be looking for.

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1 | awk ‘{print $2 ” ” $1}’ | \
sort | uniq | awk ‘{print $1}’ | uniq -c | grep “Feb/2013″
The output is as below

52 01/Feb/2013
63 02/Feb/2013
47 03/Feb/2013
62 04/Feb/2013
59 05/Feb/2013
63 06/Feb/2013

etc.
Explanation of the command

This may be useful if you want to tweak it.

Break down 1

Get IP and date (unformatted at this stage)

cat access.log | awk ‘{print $1 ” ” $4}’
Output of the above

69.175.xxx.yyy [13/Jul/2013:06:28:31
69.175.xxx.yyy [13/Jul/2013:06:28:35
69.175.xxx.yyy [13/Jul/2013:06:28:36
69.175.xxx.yyy [13/Jul/2013:06:28:41
94.123.xxx.yyy [13/Jul/2013:06:32:49
94.123.xxx.yyy [13/Jul/2013:06:32:49
94.123.xxx.yyy [13/Jul/2013:06:32:50
94.123.xxx.yyy [13/Jul/2013:06:32:50
94.123.xxx.yyy [13/Jul/2013:06:32:50
94.123.xxx.yyy [13/Jul/2013:06:32:51
Break down 2

Remove the [ bracket with sed. Remove the time portion of the output with cut.

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1
Output of the above

69.175.xxx.yyy 13/Jul/2013
69.175.xxx.yyy 13/Jul/2013
69.175.xxx.yyy 13/Jul/2013
69.175.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
Break down 3

Swap IP and Date such that Date is 1st and IP is 2nd

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1 | awk ‘{print $2 ” ” $1}’ |
Output of the above

13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
Break down 4

Remove duplicate IPs – to get unique IP hits per day

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1 | awk ‘{print $2 ” ” $1}’ | \
sort | uniq
Output of the above

13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
Break down 5

Now the IPs are no longer interesting as we only need their count. So remove IP by printing only date. Then do a uniq -c on the output to get the counts for the dates

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1 | awk ‘{print $2 ” ” $1}’ | \
sort | uniq | awk ‘{print $1}’ | uniq -c
Output of the above. So there are only two unique hits in our example for the one date.

http://tech.snathan.org/tech/apache/log_parsing

2 13/Jul/2013

Related posts

10 trechos de arquivos .Htaccess que você deve ter à mão

Eduardo
11 anos ago

Prevent syn floods [SYN_RECV] attack on Linux (cPanel) Server

Eduardo
12 anos ago

listagem de diretório no htaccess. Permitir, Negar, Desligar, Ligar listagem de diretório no .htaccess

Eduardo
10 anos ago
Sair da versão mobile