{"id":2801,"date":"2014-07-31T20:04:09","date_gmt":"2014-07-31T20:04:09","guid":{"rendered":"http:\/\/www.deuzebranaweb.com.br\/?p=2801"},"modified":"2014-07-31T20:04:09","modified_gmt":"2014-07-31T20:04:09","slug":"parsing-apache-logs","status":"publish","type":"post","link":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/2014\/07\/31\/parsing-apache-logs\/","title":{"rendered":"Parsing Apache Logs"},"content":{"rendered":"<p>cat access.log | awk &#8216;{print $1 &#8221; &#8221; $4}&#8217;<\/p>\n<p>Below are some simple awk\/sed\/etc command line scripts to parse apache logs and get quick statistics<\/p>\n<p>Unique visitors per day<\/p>\n<p>Where access.log is your combined log file with typical format as below:<\/p>\n<p>access.log<br \/>\n69.175.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:28:31 -0500] &#8220;GET \/some\/web\/folder\/somewebpage2 HTTP\/1.0&#8221; 404 4212 &#8220;http:\/\/somesubdomain.example.org\/some\/web\/folder\/some_web_page1&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/535.1 (KHTML, like Gecko) Chrome\/13.0.782.112 Safari\/535.1&#8221;<br \/>\n69.175.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:28:35 -0500] &#8220;GET \/some\/web\/folder\/?do=register HTTP\/1.0&#8221; 302 599 &#8220;http:\/\/somesubdomain.example.org\/some\/web\/folder\/somewebpage2&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/535.1 (KHTML, like Gecko) Chrome\/13.0.782.112 Safari\/535.1&#8221;<br \/>\n69.175.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:28:36 -0500] &#8220;GET \/some\/web\/folder\/somewebpage2 HTTP\/1.0&#8221; 404 4212 &#8220;http:\/\/somesubdomain.example.org\/some\/web\/folder\/somewebpage2&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/535.1 (KHTML, like Gecko) Chrome\/13.0.782.112 Safari\/535.1&#8221;<br \/>\n69.175.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:28:41 -0500] &#8220;POST \/some\/web\/folder\/somewebpage2 HTTP\/1.0&#8221; 200 2439 &#8220;http:\/\/somesubdomain.example.org\/some\/web\/folder\/somewebpage2&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/535.1 (KHTML, like Gecko) Chrome\/13.0.782.112 Safari\/535.1&#8221;<\/p>\n<p>94.123.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:32:49 -0500] &#8220;GET \/some\/web\/folder\/some_web_page1 HTTP\/1.1&#8221; 200 5121 &#8220;http:\/\/somesubdomain.example.org\/&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko\/20100101 Firefox\/5.0&#8221;<br \/>\n94.123.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:32:49 -0500] &#8220;GET \/some\/web\/folder\/?do=login HTTP\/1.1&#8221; 302 599 &#8220;http:\/\/somesubdomain.example.org\/some\/web\/folder\/some_web_page1&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko\/20100101 Firefox\/5.0&#8221;<br \/>\n94.123.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:32:50 -0500] &#8220;GET \/some\/web\/folder\/somewebpage2 HTTP\/1.1&#8221; 404 4214 &#8220;http:\/\/somesubdomain.example.org\/some\/web\/folder\/some_web_page1&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko\/20100101 Firefox\/5.0&#8221;<br \/>\n94.123.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:32:50 -0500] &#8220;GET \/some\/web\/folder\/?do=register HTTP\/1.1&#8221; 302 599 &#8220;http:\/\/somesubdomain.example.org\/some\/web\/folder\/somewebpage2&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko\/20100101 Firefox\/5.0&#8221;<br \/>\n94.123.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:32:50 -0500] &#8220;GET \/some\/web\/folder\/somewebpage2 HTTP\/1.1&#8221; 404 4214 &#8220;http:\/\/somesubdomain.example.org\/some\/web\/folder\/somewebpage2&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko\/20100101 Firefox\/5.0&#8221;<br \/>\n94.123.xxx.yyy &#8211; &#8211; [13\/Jul\/2013:06:32:51 -0500] &#8220;POST \/some\/web\/folder\/somewebpage2 HTTP\/1.1&#8221; 200 2530 &#8220;http:\/\/somesubdomain.example.org\/some\/web\/folder\/somewebpage2&#8221; &#8220;Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko\/20100101 Firefox\/5.0&#8243;<br \/>\nBelow is the command line script. This gets the unique hits per day. There is a grep at the very end to do a final filter for the Month and Year you may be looking for.<\/p>\n<p>cat access.log | awk &#8216;{print $1 &#8221; &#8221; $4}&#8217; | sed &#8216;s\/\\[\/\/&#8217; | cut -d&#8221;:&#8221; -f1 | awk &#8216;{print $2 &#8221; &#8221; $1}&#8217; | \\<br \/>\nsort | uniq | awk &#8216;{print $1}&#8217; | uniq -c | grep &#8220;Feb\/2013&#8243;<br \/>\nThe output is as below<\/p>\n<p>     52 01\/Feb\/2013<br \/>\n     63 02\/Feb\/2013<br \/>\n     47 03\/Feb\/2013<br \/>\n     62 04\/Feb\/2013<br \/>\n     59 05\/Feb\/2013<br \/>\n     63 06\/Feb\/2013<br \/>\n     &#8230;<br \/>\netc.<br \/>\nExplanation of the command<\/p>\n<p>This may be useful if you want to tweak it.<\/p>\n<p>Break down 1<\/p>\n<p>Get IP and date (unformatted at this stage)<\/p>\n<p>cat access.log | awk &#8216;{print $1 &#8221; &#8221; $4}&#8217;<br \/>\nOutput of the above<\/p>\n<p>69.175.xxx.yyy [13\/Jul\/2013:06:28:31<br \/>\n69.175.xxx.yyy [13\/Jul\/2013:06:28:35<br \/>\n69.175.xxx.yyy [13\/Jul\/2013:06:28:36<br \/>\n69.175.xxx.yyy [13\/Jul\/2013:06:28:41<br \/>\n94.123.xxx.yyy [13\/Jul\/2013:06:32:49<br \/>\n94.123.xxx.yyy [13\/Jul\/2013:06:32:49<br \/>\n94.123.xxx.yyy [13\/Jul\/2013:06:32:50<br \/>\n94.123.xxx.yyy [13\/Jul\/2013:06:32:50<br \/>\n94.123.xxx.yyy [13\/Jul\/2013:06:32:50<br \/>\n94.123.xxx.yyy [13\/Jul\/2013:06:32:51<br \/>\nBreak down 2<\/p>\n<p>Remove the [ bracket with sed. Remove the time portion of the output with cut.<\/p>\n<p>cat access.log | awk &#8216;{print $1 &#8221; &#8221; $4}&#8217; | sed &#8216;s\/\\[\/\/&#8217; | cut -d&#8221;:&#8221; -f1<br \/>\nOutput of the above<\/p>\n<p>69.175.xxx.yyy 13\/Jul\/2013<br \/>\n69.175.xxx.yyy 13\/Jul\/2013<br \/>\n69.175.xxx.yyy 13\/Jul\/2013<br \/>\n69.175.xxx.yyy 13\/Jul\/2013<br \/>\n94.123.xxx.yyy 13\/Jul\/2013<br \/>\n94.123.xxx.yyy 13\/Jul\/2013<br \/>\n94.123.xxx.yyy 13\/Jul\/2013<br \/>\n94.123.xxx.yyy 13\/Jul\/2013<br \/>\n94.123.xxx.yyy 13\/Jul\/2013<br \/>\n94.123.xxx.yyy 13\/Jul\/2013<br \/>\nBreak down 3<\/p>\n<p>Swap IP and Date such that Date is 1st and IP is 2nd<\/p>\n<p>cat access.log | awk &#8216;{print $1 &#8221; &#8221; $4}&#8217; | sed &#8216;s\/\\[\/\/&#8217; | cut -d&#8221;:&#8221; -f1 | awk &#8216;{print $2 &#8221; &#8221; $1}&#8217; |<br \/>\nOutput of the above<\/p>\n<p>13\/Jul\/2013 69.175.xxx.yyy<br \/>\n13\/Jul\/2013 69.175.xxx.yyy<br \/>\n13\/Jul\/2013 69.175.xxx.yyy<br \/>\n13\/Jul\/2013 69.175.xxx.yyy<br \/>\n13\/Jul\/2013 94.123.xxx.yyy<br \/>\n13\/Jul\/2013 94.123.xxx.yyy<br \/>\n13\/Jul\/2013 94.123.xxx.yyy<br \/>\n13\/Jul\/2013 94.123.xxx.yyy<br \/>\n13\/Jul\/2013 94.123.xxx.yyy<br \/>\n13\/Jul\/2013 94.123.xxx.yyy<br \/>\nBreak down 4<\/p>\n<p>Remove duplicate IPs &#8211; to get unique IP hits per day<\/p>\n<p>cat access.log | awk &#8216;{print $1 &#8221; &#8221; $4}&#8217; | sed &#8216;s\/\\[\/\/&#8217; | cut -d&#8221;:&#8221; -f1 | awk &#8216;{print $2 &#8221; &#8221; $1}&#8217; | \\<br \/>\nsort | uniq<br \/>\nOutput of the above<\/p>\n<p>13\/Jul\/2013 69.175.xxx.yyy<br \/>\n13\/Jul\/2013 94.123.xxx.yyy<br \/>\nBreak down 5<\/p>\n<p>Now the IPs are no longer interesting as we only need their count. So remove IP by printing only date. Then do a uniq -c on the output to get the counts for the dates<\/p>\n<p>cat access.log | awk &#8216;{print $1 &#8221; &#8221; $4}&#8217; | sed &#8216;s\/\\[\/\/&#8217; | cut -d&#8221;:&#8221; -f1 | awk &#8216;{print $2 &#8221; &#8221; $1}&#8217; | \\<br \/>\nsort | uniq | awk &#8216;{print $1}&#8217; | uniq -c<br \/>\nOutput of the above. So there are only two unique hits in our example for the one date.<\/p>\n<p>http:\/\/tech.snathan.org\/tech\/apache\/log_parsing<\/p>\n<p>2 13\/Jul\/2013<\/p>\n","protected":false},"excerpt":{"rendered":"<p>cat access.log | awk &#8216;{print $1 &#8221; &#8221; $4}&#8217; Below are some simple awk\/sed\/etc command line scripts to parse apache logs and get quick statistics Unique visitors per day Where access.log is your combined log file with typical format as below: access.log 69.175.xxx.yyy &#8211; &#8211;&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_angie_page":false,"page_builder":"","footnotes":""},"categories":[5],"tags":[],"class_list":["post-2801","post","type-post","status-publish","format-standard","hentry","category-apache2"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/wp-json\/wp\/v2\/posts\/2801","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/wp-json\/wp\/v2\/comments?post=2801"}],"version-history":[{"count":0,"href":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/wp-json\/wp\/v2\/posts\/2801\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/wp-json\/wp\/v2\/media?parent=2801"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/wp-json\/wp\/v2\/categories?post=2801"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.deuzebranaweb.com.br\/index.php\/wp-json\/wp\/v2\/tags?post=2801"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}