本文共 26251 字,大约阅读时间需要 87 分钟。
http://www.cnblogs.com/sdytzz/archive/2011/01/20/1292892.html webalizer,在windows下运行,除了win32版的webalizer外,不需要其他任何支持。 win32版的webalizer下载地址在这里 下载后,解压缩在任意文件夹下即可 然后打开simple.conf,另存为webalizer.conf 打开webalizer.conf,修改。 Code 1 # 2 # Sample Webalizer configuration file 3 # Copyright 1997-2000 by Bradford L. Barrett (brad@mrunix.net) 4 # 5 # Distributed under the GNU General Public License. See the 6 # files "Copyright" and "COPYING" provided with the webalizer 7 # distribution for additional information. 8 # 9 # This is a sample configuration file for the Webalizer (ver 2.01) 10 # Lines starting with pound signs '#' are comment lines and are 11 # ignored. Blank lines are skipped as well. Other lines are considered 12 # as configuration lines, and have the form "ConfigOption Value" where 13 # ConfigOption is a valid configuration keyword, and Value is the value 14 # to assign that configuration option. Invalid keyword/values are 15 # ignored, with appropriate warnings being displayed. There must be 16 # at least one space or tab between the keyword and its value. 17 # 18 # As of version 0.98, The Webalizer will look for a 'default' configuration 19 # file named "webalizer.conf" in the current directory, and if not found 20 # there, will look for "/etc/webalizer.conf". 21 22 23 # LogFile defines the web server log file to use. If not specified 24 # here or on on the command line, input will default to STDIN. If 25 # the log filename ends in '.gz' (ie: a gzip compressed file), it will 26 # be decompressed on the fly as it is being read. 27 28 LogFile C:\WINDOWS\system32\LogFiles\W3SVC529919685\nc080917.log 29 30 # LogType defines the log type being processed. Normally, the Webalizer 31 # expects a CLF or Combined web server log as input. Using this option, 32 # you can process ftp logs as well (xferlog as produced by wu-ftp and 33 # others), or Squid native logs. Values can be 'clf', 'ftp' or 'squid', 34 # with 'clf' the default. 35 36 LogType iis 37 38 # OutputDir is where you want to put the output files. This should 39 # should be a full path name, however relative ones might work as well. 40 # If no output directory is specified, the current directory will be used. 41 42 OutputDir E:\wwwroot\banbank\webalizer 43 44 # HistoryName allows you to specify the name of the history file produced 45 # by the Webalizer. The history file keeps the data for up to 12 months 46 # worth of logs, used for generating the main HTML page (index.html). 47 # The default is a file named "webalizer.hist", stored in the specified 48 # output directory. If you specify just the filename (without a path), 49 # it will be kept in the specified output directory. Otherwise, the path 50 # is relative to the output directory, unless absolute (leading /). 51 52 #HistoryName webalizer.hist 53 54 # Incremental processing allows multiple partial log files to be used 55 # instead of one huge one. Useful for large sites that have to rotate 56 # their log files more than once a month. The Webalizer will save its 57 # internal state before exiting, and restore it the next time run, in 58 # order to continue processing where it left off. This mode also causes 59 # The Webalizer to scan for and ignore duplicate records (records already 60 # processed by a previous run). See the README file for additional 61 # information. The value may be 'yes' or 'no', with a default of 'no'. 62 # The file 'webalizer.current' is used to store the current state data, 63 # and is located in the output directory of the program (unless changed 64 # with the IncrementalName option below). Please read at least the section 65 # on Incremental processing in the README file before you enable this option. 66 67 Incremental yes 68 69 # IncrementalName allows you to specify the filename for saving the 70 # incremental data in. It is similar to the HistoryName option where the 71 # name is relative to the specified output directory, unless an absolute 72 # filename is specified. The default is a file named "webalizer.current" 73 # kept in the normal output directory. If you don't specify "Incremental" 74 # as 'yes' then this option has no meaning. 75 76 #IncrementalName webalizer.current 77 78 # ReportTitle is the text to display as the title. The hostname 79 # (unless blank) is appended to the end of this string (seperated with 80 # a space) to generate the final full title string. 81 # Default is (for english) "Usage Statistics for". 82 83 #ReportTitle Usage Statistics for 84 85 # HostName defines the hostname for the report. This is used in 86 # the title, and is prepended to the URL table items. This allows 87 # clicking on URL's in the report to go to the proper location in 88 # the event you are running the report on a 'virtual' web server, 89 # or for a server different than the one the report resides on. 90 # If not specified here, or on the command line, webalizer will 91 # try to get the hostname via a uname system call. If that fails, 92 # it will default to "localhost". 93 94 #HostName localhost 95 96 # HTMLExtension allows you to specify the filename extension to use 97 # for generated HTML pages. Normally, this defaults to "html", but 98 # can be changed for sites who need it (like for PHP embeded pages). 99 100 #HTMLExtension html 101 102 # PageType lets you tell the Webalizer what types of URL's you 103 # consider a 'page'. Most people consider html and cgi documents 104 # as pages, while not images and audio files. If no types are 105 # specified, defaults will be used ('htm*', 'cgi' and HTMLExtension 106 # if different for web logs, 'txt' for ftp logs). 107 108 PageType htm* 109 PageType cgi 110 #PageType phtml 111 #PageType php3 112 #PageType pl 113 114 # UseHTTPS should be used if the analysis is being run on a 115 # secure server, and links to urls should use 'https://' instead 116 # of the default 'http://'. If you need this, set it to 'yes'. 117 # Default is 'no'. This only changes the behaviour of the 'Top 118 # URL's' table. 119 120 #UseHTTPS no 121 122 # DNSCache specifies the DNS cache filename to use for reverse DNS lookups. 123 # This file must be specified if you wish to perform name lookups on any IP 124 # addresses found in the log file. If an absolute path is not given as 125 # part of the filename (ie: starts with a leading '/'), then the name is 126 # relative to the default output directory. See the DNS.README file for 127 # additional information. 128 # 129 # Note that this is not yet supported in the Windows port of Webalizer. 130 131 #DNSCache dns_cache.db 132 133 # DNSChildren allows you to specify how many "children" processes are 134 # run to perform DNS lookups to create or update the DNS cache file. 135 # If a number is specified, the DNS cache file will be created/updated 136 # each time the Webalizer is run, immediately prior to normal processing, 137 # by running the specified number of "children" processes to perform 138 # DNS lookups. If used, the DNS cache filename MUST be specified as 139 # well. The default value is zero (0), which disables DNS cache file 140 # creation/updates at run time. The number of children processes to 141 # run may be anywhere from 1 to 100, however a large number may effect 142 # normal system operations. Reasonable values should be between 5 and 143 # 20. See the DNS.README file for additional information. 144 145 #DNSChildren 0 146 147 # HTMLPre defines HTML code to insert at the very beginning of the 148 # file. Default is the DOCTYPE line shown below. Max line length 149 # is 80 characters, so use multiple HTMLPre lines if you need more. 150 151 #HTMLPre <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> 152 153 # HTMLHead defines HTML code to insert within the <HEAD></HEAD> 154 # block, immediately after the <TITLE> line. Maximum line length 155 # is 80 characters, so use multiple lines if needed. 156 157 #HTMLHead <META NAME="author" CONTENT="The Webalizer"> 158 159 # HTMLBody defined the HTML code to be inserted, starting with the 160 # <BODY> tag. If not specified, the default is shown below. If 161 # used, you MUST include your own <BODY> tag as the first line. 162 # Maximum line length is 80 char, use multiple lines if needed. 163 164 #HTMLBody <BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000"> 165 166 # HTMLPost defines the HTML code to insert immediately before the 167 # first <HR> on the document, which is just after the title and 168 # "summary period"-"Generated on:" lines. If anything, this should 169 # be used to clean up in case an image was inserted with HTMLBody. 170 # As with HTMLHead, you can define as many of these as you want and 171 # they will be inserted in the output stream in order of apperance. 172 # Max string size is 80 characters. Use multiple lines if you need to. 173 174 #HTMLPost <BR CLEAR="all"> 175 176 # HTMLTail defines the HTML code to insert at the bottom of each 177 # HTML document, usually to include a link back to your home 178 # page or insert a small graphic. It is inserted as a table 179 # data element (ie: <TD> your code here </TD>) and is right 180 # alligned with the page. Max string size is 80 characters. 181 182 #HTMLTail <IMG SRC="msfree.png" ALT="100% Micro$oft free!"> 183 184 # HTMLEnd defines the HTML code to add at the very end of the 185 # generated files. It defaults to what is shown below. If 186 # used, you MUST specify the </BODY> and </HTML> closing tags 187 # as the last lines. Max string length is 80 characters. 188 189 #HTMLEnd </BODY></HTML> 190 191 # The Quiet option suppresses output messages Useful when run 192 # as a cron job to prevent bogus e-mails. Values can be either 193 # "yes" or "no". Default is "no". Note: this does not suppress 194 # warnings and errors (which are printed to stderr). 195 196 #Quiet no 197 198 # ReallyQuiet will supress all messages including errors and 199 # warnings. Values can be 'yes' or 'no' with 'no' being the 200 # default. If 'yes' is used here, it cannot be overriden from 201 # the command line, so use with caution. A value of 'no' has 202 # no effect. 203 204 #ReallyQuiet no 205 206 # TimeMe allows you to force the display of timing information 207 # at the end of processing. A value of 'yes' will force the 208 # timing information to be displayed. A value of 'no' has no 209 # effect. 210 211 #TimeMe no 212 213 # GMTTime allows reports to show GMT (UTC) time instead of local 214 # time. Default is to display the time the report was generated 215 # in the timezone of the local machine, such as EDT or PST. This 216 # keyword allows you to have times displayed in UTC instead. Use 217 # only if you really have a good reason, since it will probably 218 # screw up the reporting periods by however many hours your local 219 # time zone is off of GMT. 220 221 #GMTTime no 222 223 # Debug prints additional information for error messages. This 224 # will cause webalizer to dump bad records/fields instead of just 225 # telling you it found a bad one. As usual, the value can be 226 # either "yes" or "no". The default is "no". It shouldn't be 227 # needed unless you start getting a lot of Warning or Error 228 # messages and want to see why. (Note: warning and error messages 229 # are printed to stderr, not stdout like normal messages). 230 231 #Debug no 232 233 # FoldSeqErr forces the Webalizer to ignore sequence errors. 234 # This is useful for Netscape and other web servers that cache 235 # the writing of log records and do not guarentee that they 236 # will be in chronological order. The use of the FoldSeqErr 237 # option will cause out of sequence log records to be treated 238 # as if they had the same time stamp as the last valid record. 239 # Default is to ignore out of sequence log records. 240 241 #FoldSeqErr no 242 243 # VisitTimeout allows you to set the default timeout for a visit 244 # (sometimes called a 'session'). The default is 30 minutes, 245 # which should be fine for most sites. 246 # Visits are determined by looking at the time of the current 247 # request, and the time of the last request from the site. If 248 # the time difference is greater than the VisitTimeout value, it 249 # is considered a new visit, and visit totals are incremented. 250 # Value is the number of seconds to timeout (default=1800=30min) 251 252 #VisitTimeout 1800 253 254 # IgnoreHist shouldn't be used in a config file, but it is here 255 # just because it might be usefull in certain situations. If the 256 # history file is ignored, the main "index.html" file will only 257 # report on the current log files contents. Usefull only when you 258 # want to reproduce the reports from scratch. USE WITH CAUTION! 259 # Valid values are "yes" or "no". Default is "no". 260 261 #IgnoreHist no 262 263 # Country Graph allows the usage by country graph to be disabled. 264 # Values can be 'yes' or 'no', default is 'yes'. 265 266 #CountryGraph yes 267 268 # DailyGraph and DailyStats allows the daily statistics graph 269 # and statistics table to be disabled (not displayed). Values 270 # may be "yes" or "no". Default is "yes". 271 272 #DailyGraph yes 273 #DailyStats yes 274 275 # HourlyGraph and HourlyStats allows the hourly statistics graph 276 # and statistics table to be disabled (not displayed). Values 277 # may be "yes" or "no". Default is "yes". 278 279 #HourlyGraph yes 280 #HourlyStats yes 281 282 # GraphLegend allows the color coded legends to be turned on or off 283 # in the graphs. The default is for them to be displayed. This only 284 # toggles the color coded legends, the other legends are not changed. 285 # If you think they are hideous and ugly, say 'no' here :) 286 287 #GraphLegend yes 288 289 # GraphLines allows you to have index lines drawn behind the graphs. 290 # I personally am not crazy about them, but a lot of people requested 291 # them and they weren't a big deal to add. The number represents the 292 # number of lines you want displayed. Default is 2, you can disable 293 # the lines by using a value of zero ('0'). [max is 20] 294 # Note, due to rounding errors, some values don't work quite right. 295 # The lower the better, with 1,2,3,4,6 and 10 producing nice results. 296 297 #GraphLines 2 298 299 # The "Top" options below define the number of entries for each table. 300 # Defaults are Sites=30, URL's=30, Referrers=30 and Agents=15, and 301 # Countries=30. TopKSites and TopKURLs (by KByte tables) both default 302 # to 10, as do the top entry/exit tables (TopEntry/TopExit). The top 303 # search strings and usernames default to 20. Tables may be disabled 304 # by using zero (0) for the value. 305 306 #TopSites 30 307 #TopKSites 10 308 #TopURLs 30 309 #TopKURLs 10 310 #TopReferrers 30 311 #TopAgents 15 312 #TopCountries 30 313 #TopEntry 10 314 #TopExit 10 315 #TopSearch 20 316 #TopUsers 20 317 318 # The All* keywords allow the display of all URL's, Sites, Referrers 319 # User Agents, Search Strings and Usernames. If enabled, a seperate 320 # HTML page will be created, and a link will be added to the bottom 321 # of the appropriate "Top" table. There are a couple of conditions 322 # for this to occur.. First, there must be more items than will fit 323 # in the "Top" table (otherwise it would just be duplicating what is 324 # already displayed). Second, the listing will only show those items 325 # that are normally visable, which means it will not show any hidden 326 # items. Grouped entries will be listed first, followed by individual 327 # items. The value for these keywords can be either 'yes' or 'no', 328 # with the default being 'no'. Please be aware that these pages can 329 # be quite large in size, particularly the sites page, and seperate 330 # pages are generated for each month, which can consume quite a lot 331 # of disk space depending on the traffic to your site. 332 333 #AllSites no 334 AllURLs yes 335 #AllReferrers no 336 #AllAgents no 337 AllSearchStr yes 338 #AllUsers no 339 340 # The Webalizer normally strips the string 'index.' off the end of 341 # URL's in order to consolidate URL totals. For example, the URL 342 # /somedir/index.html is turned into /somedir/ which is really the 343 # same URL. This option allows you to specify additional strings 344 # to treat in the same way. You don't need to specify 'index.' as 345 # it is always scanned for by The Webalizer, this option is just to 346 # specify _additional_ strings if needed. If you don't need any, 347 # don't specify any as each string will be scanned for in EVERY 348 # log record A bunch of them will degrade performance. Also, 349 # the string is scanned for anywhere in the URL, so a string of 350 # 'home' would turn the URL /somedir/homepages/brad/home.html into 351 # just /somedir/ which is probably not what was intended. 352 353 #IndexAlias home.htm 354 #IndexAlias homepage.htm 355 356 # The Hide*, Group* and Ignore* and Include* keywords allow you to 357 # change the way Sites, URL's, Referrers, User Agents and Usernames 358 # are manipulated. The Ignore* keywords will cause The Webalizer to 359 # completely ignore records as if they didn't exist (and thus not 360 # counted in the main site totals). The Hide* keywords will prevent 361 # things from being displayed in the 'Top' tables, but will still be 362 # counted in the main totals. The Group* keywords allow grouping 363 # similar objects as if they were one. Grouped records are displayed 364 # in the 'Top' tables and can optionally be displayed in BOLD and/or 365 # shaded. Groups cannot be hidden, and are not counted in the main 366 # totals. The Group* options do not, by default, hide all the items 367 # that it matches. If you want to hide the records that match (so just 368 # the grouping record is displayed), follow with an identical Hide* 369 # keyword with the same value. (see example below) In addition, 370 # Group* keywords may have an optional label which will be displayed 371 # instead of the keywords value. The label should be seperated from 372 # the value by at least one 'white-space' character, such as a space 373 # or tab. 374 # 375 # The value can have either a leading or trailing '*' wildcard 376 # character. If no wildcard is found, a match can occur anywhere 377 # in the string. Given a string "www.yourmama.com", the values "your", 378 # "*mama.com" and "www.your*" will all match. 379 380 # Your own site should be hidden 381 #HideSite *mrunix.net 382 #HideSite localhost 383 384 # Your own site gives most referrals 385 #HideReferrer mrunix.net/ 386 387 # This one hides non-referrers ("-" Direct requests) 388 #HideReferrer Direct Request 389 390 # Usually you want to hide these 391 HideURL *.gif 392 HideURL *.GIF 393 HideURL *.jpg 394 HideURL *.JPG 395 HideURL *.png 396 HideURL *.PNG 397 HideURL *.ra 398 HideURL *.css 399 400 # Hiding agents is kind of futile 401 #HideAgent RealPlayer 402 403 # You can also hide based on authenticated username 404 #HideUser root 405 #HideUser admin 406 407 # Grouping options 408 #GroupURL /cgi-bin/* CGI Scripts 409 #GroupURL /images/* Images 410 411 #GroupSite *.aol.com 412 #GroupSite *.compuserve.com 413 414 #GroupReferrer yahoo.com/ Yahoo! 415 #GroupReferrer excite.com/ Excite 416 #GroupReferrer infoseek.com/ InfoSeek 417 #GroupReferrer webcrawler.com/ WebCrawler 418 419 #GroupUser root Admin users 420 #GroupUser admin Admin users 421 #GroupUser wheel Admin users 422 423 # The following is a great way to get an overall total 424 # for browsers, and not display all the detail records. 425 # (You should use MangleAgent to refine further) 426 427 #GroupAgent MSIE Micro$oft Internet Exploder 428 #HideAgent MSIE 429 #GroupAgent Mozilla Netscape 430 #HideAgent Mozilla 431 #GroupAgent Lynx* Lynx 432 #HideAgent Lynx* 433 434 # HideAllSites allows forcing individual sites to be hidden in the 435 # report. This is particularly useful when used in conjunction 436 # with the "GroupDomain" feature, but could be useful in other 437 # situations as well, such as when you only want to display grouped 438 # sites (with the GroupSite keywords). The value for this 439 # keyword can be either 'yes' or 'no', with 'no' the default, 440 # allowing individual sites to be displayed. 441 442 #HideAllSites no 443 444 # The GroupDomains keyword allows you to group individual hostnames 445 # into their respective domains. The value specifies the level of 446 # grouping to perform, and can be thought of as 'the number of dots' 447 # that will be displayed. For example, if a visiting host is named 448 # cust1.tnt.mia.uu.net, a domain grouping of 1 will result in just 449 # "uu.net" being displayed, while a 2 will result in "mia.uu.net". 450 # The default value of zero disable this feature. Domains will only 451 # be grouped if they do not match any existing "GroupSite" records, 452 # which allows overriding this feature with your own if desired. 453 454 #GroupDomains 0 455 456 # The GroupShading allows grouped rows to be shaded in the report. 457 # Useful if you have lots of groups and individual records that 458 # intermingle in the report, and you want to diferentiate the group 459 # records a little more. Value can be 'yes' or 'no', with 'yes' 460 # being the default. 461 462 #GroupShading yes 463 464 # GroupHighlight allows the group record to be displayed in BOLD. 465 # Can be either 'yes' or 'no' with the default 'yes'. 466 467 #GroupHighlight yes 468 469 # The Ignore* keywords allow you to completely ignore log records based 470 # on hostname, URL, user agent, referrer or username. I hessitated in 471 # adding these, since the Webalizer was designed to generate _accurate_ 472 # statistics about a web servers performance. By choosing to ignore 473 # records, the accuracy of reports become skewed, negating why I wrote 474 # this program in the first place. However, due to popular demand, here 475 # they are. Use the same as the Hide* keywords, where the value can have 476 # a leading or trailing wildcard '*'. Use at your own risk ;) 477 478 #IgnoreSite bad.site.net 479 #IgnoreURL /test* 480 #IgnoreReferrer file:/* 481 #IgnoreAgent RealPlayer 482 #IgnoreUser root 483 484 # The Include* keywords allow you to force the inclusion of log records 485 # based on hostname, URL, user agent, referrer or username. They take 486 # precidence over the Ignore* keywords. Note: Using Ignore/Include 487 # combinations to selectivly process parts of a web site is _extremely 488 # inefficent_!!! Avoid doing so if possible (ie: grep the records to a 489 # seperate file if you really want that kind of report). 490 491 # Example: Only show stats on Joe User's pages 492 #IgnoreURL * 493 #IncludeURL ~joeuser* 494 495 # Or based on an authenticated username 496 #IgnoreUser * 497 #IncludeUser someuser 498 499 # The MangleAgents allows you to specify how much, if any, The Webalizer 500 # should mangle user agent names. This allows several levels of detail 501 # to be produced when reporting user agent statistics. There are six 502 # levels that can be specified, which define different levels of detail 503 # supression. Level 5 shows only the browser name (MSIE or Mozilla) 504 # and the major version number. Level 4 adds the minor version number 505 # (single decimal place). Level 3 displays the minor version to two 506 # decimal places. Level 2 will add any sub-level designation (such 507 # as Mozilla/3.01Gold or MSIE 3.0b). Level 1 will attempt to also add 508 # the system type if it is specified. The default Level 0 displays the 509 # full user agent field without modification and produces the greatest 510 # amount of detail. User agent names that can't be mangled will be 511 # left unmodified. 512 513 #MangleAgents 0 514 515 # The SearchEngine keywords allow specification of search engines and 516 # their query strings on the URL. These are used to locate and report 517 # what search strings are used to find your site. The first word is 518 # a substring to match in the referrer field that identifies the search 519 # engine, and the second is the URL variable used by that search engine 520 # to define it's search terms. 521 522 SearchEngine yahoo.com p= 523 SearchEngine altavista.com q= 524 SearchEngine google.com q= 525 SearchEngine eureka.com q= 526 SearchEngine lycos.com query= 527 SearchEngine hotbot.com MT= 528 SearchEngine msn.com MT= 529 SearchEngine infoseek.com qt= 530 SearchEngine webcrawler searchText= 531 SearchEngine excite search= 532 SearchEngine netscape.com search= 533 SearchEngine mamma.com query= 534 SearchEngine alltheweb.com query= 535 SearchEngine northernlight.com qr= 536 SearchEngine baidu.com word= 537 SearchEngine sina.com.cn word= 538 SearchEngine sohu.com word= 539 SearchEngine 163.com q= 540 541 # The Dump* keywords allow the dumping of Sites, URL's, Referrers 542 # User Agents, Usernames and Search strings to seperate tab delimited 543 # text files, suitable for import into most database or spreadsheet 544 # programs. 545 546 # DumpPath specifies the path to dump the files. If not specified, 547 # it will default to the current output directory. Do not use a 548 # trailing slash ('/'). 549 550 #DumpPath /var/lib/httpd/logs 551 552 # The DumpHeader keyword specifies if a header record should be 553 # written to the file. A header record is the first record of the 554 # file, and contains the labels for each field written. Normally, 555 # files that are intended to be imported into a database system 556 # will not need a header record, while spreadsheets usually do. 557 # Value can be either 'yes' or 'no', with 'no' being the default. 558 559 #DumpHeader no 560 561 # DumpExtension allow you to specify the dump filename extension 562 # to use. The default is "tab", but some programs are pickey about 563 # the filenames they use, so you may change it here (for example, 564 # some people may prefer to use "csv"). 565 566 #DumpExtension tab 567 568 # These control the dumping of each individual table. The value 569 # can be either 'yes' or 'no'.. the default is 'no'. 570 571 #DumpSites no 572 DumpURLs yes 573 #DumpReferrers no 574 #DumpAgents no 575 #DumpUsers no 576 DumpSearchStr yes 577 578 # End of configuration file Have a nice day! 579 580 581 大家可以直接拷贝了我的用。 另外,需要将iis里的日志格式,设置为NC,这一步很重要,如果分析日志的时候出现truncating oversized username,那就是这里的问题了。呵呵 然后运行 webalizer.exe 分析就可以了。