![]() |
![]() Web Server Comparison:Microsoft Windows NT Server 4.0
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ContentsExecutive Summary |
Executive SummaryWindows NT Server 4.0 Is Four Times Faster as a Web Server Than Solaris 2.6 with Sun Web Server 1.0 and Has 10.3 Times Better Price/PerformanceMindcraft tested two Web servers to determine the maximum performance possible from each:
Table 1 shows the peak performance we measured for each Web server in HTTP requests per second (RPS), the peak throughput in megabytes per second, the price of each system, and the price/performance in dollars per RPS. Table 1: Performance
Summary
We tested these systems with Ziff-Davis Benchmark Operations WebBench 2.0. We created a workload to model published research on Web usage patterns. This workload also includes a significant amount of dynamically generated Web pages. See The Benchmark section for more details. The benchmark results clearly show that a Windows NT Server 4.0-ProLiant 3000 platform is a more cost-effective, higher performance Web server solution than a Solaris 2.6-SWS 1.0-Ultra Enterprise 450 platform. The Windows NT Server platform using ISAPI to respond to dynamic Web requests is four times faster than the Solaris-SWS platform and has 10.3 times better price/performance. Furthermore, the Windows NT Server platform using ASPs is 2.1 times faster and has 5.4 times better price/performance than the Solaris-SWS platform. The BenchmarkMindcraft used WebBench 2.0 with a workload we created that models the Web server workload characterization published in a paper by Martin F. Arlitt and Carey L. Williamson of the University of Saskatchewan and later corroborated by Jakob Nielsen's analysis of the file access pattern at Sun's Web site. We chose WebBench 2.0 as the scaffold to run the workload because it allows you to build custom workloads easily, it generates detailed result reports with graphs, and it has an easy-to-use graphical interface for controlling the benchmark process. The WorkloadArlitt and Williamson analyzed Web server logs from six different sites: three from academic sites, two from scientific research sites, and one from a commercial Internet service provider. One characteristic common to all sites analyzed is what Arlitt and Williamson call "concentration of references," also commonly referred to as "file access pattern" or "file access frequency." They report finding that "10% of the files accessed account for 90% of server requests and 90% of the bytes transferred." At first, this finding may seem startling. However, look at how you use a Web site. Typically, you'll start at the home page of a site then click on a link you are interested in to get to the next page. You usually only visit a small fraction of the pages at a typical Web site. This kind of access pattern follows the Zipf distribution and occurs in many places besides the Internet. For example, new releases of popular movies on video tape account for a much higher percentage of rentals than old classics. Similarly, books on popular "best seller" lists account for a much higher percentage of books borrowed from libraries than other books. We took the following steps to create a WebBench 2.0 workload based on a Zipf distribution file access pattern:
With this technique, we were able to generate a WebBench workload file. WebBench 2.0 supports up to 100 classes or groups of URLs. Because each class can hold up to 100 URLs and because the Zipf distribution creates a file access pattern with a very large number of files having an extremely small number of accesses, we could not use all of the 6400 unique URLs that were in the log file. We did use 704 URLs that simulated 94% of the file accesses captured in the log file. Each of the URL requests that we could not simulate in the workload amounted to less than 0.001% of the total requests. The average file size, based on the log file analysis, was just over 14,700 bytes. Dynamic RequestsBased on discussions we had with several Web server developers and computer vendors as well as log file analysis of some very popular commercial Web sites, we decided to make 30% of the HTTP requests "dynamic." A dynamic request is one that a Web server passes to a program, which then provides the response. We used two types of dynamic requests:
We wrote two dynamic request-processing programs for Windows NT Server IIS Web services: an ISAPI module and an ASP that called the ISAPI module. ISAPI modules run in the same process as IIS, so handling a dynamic request does not need a process context switch. ASPs are HTML pages that have an embedded program in a scripting language; in this case we used VB Script. SWS 1.0 does not provide a programming interface equivalent to ISAPI or ASP, it only allows for CGI programs. So we wrote a CGI version of the dynamic request-processing program for the benchmark. This meant that each dynamic request would pay the performance penalty of launching a new process. WebBench 2.0 ConfigurationBecause the purpose of this test was to obtain the maximum performance possible, we set up each operating system, each software Web server, each dynamic request-handling program, and the WebBench 2.0 test systems accordingly. Table 2 shows the key WebBench 2.0 configuration parameters we used. The persistent connection percentage was based on information exchanged in the above referenced discussions. The number of requests per persistent connection typically will follow some type of distribution. However, because WebBench 2.0 randomly selects values to use in a specified range, we decided to force this parameter to be 4, the average agreed by the companies in our discussions. Table 2: Key WebBench Configuration Parameters
We wanted to use HTTP 1.1 as the protocol between WebBench and the Web servers. However, we were unable to use HTTP 1.1 persistent connections for SWS 1.0 because WebBench returned an error indicating the feature was not supported. We could not find an SWS 1.0 configuration parameter to turn persistent connections on. We did leave the SWS keepalive_enable configuration parameter at its default setting of yes. So we had to revert to HTTP 1.0. Because WebBench 2.0 will make all requests on one connection if HTTP 1.0 keep-alives are specified, we did not use them for SWS. Based on tests we did on Windows NT Server without persistent connections or keep-alives, we estimate that not being able to use them for SWS 1.0 lowered its measured performance by 10% or less. We set the WebBench configuration parameter number of worker threads per client to the values shown in Table 2 in order to get the best results for each system. In trial runs, we found that setting other values for number of worker threads per client resulted in lower peak performance. It is the total number of worker threads issuing requests that determines the overall load on a Web server, not the number of test systems. Performance AnalysisLooking at the ResultsWebBench 2.0 gives two primary metrics for comparing Web server performance:
Figure 1 shows the total number of requests per second for both Windows NT Server (WNTS in the figure) and Solaris-SWS (Solaris in the figure). The x-axis shows the total number of test threads used at each data point; a higher number of threads indicates a larger load on the server. The number of test threads is different for Windows NT Server and Solaris-SWS because we were able to obtain the best performance for each platform with the number of threads shown. Figure 2 presents the throughput for each platform. Figure 1: HTTP Requests/Second Performance (larger numbers are better) Figure 2: Throughput Performance (larger numbers are better) In order to understand what the WebBench measurements mean you need to know how WebBench 2.0 works. It stresses a Web server by using a number of test systems to request URLs. Each WebBench test system can be configured to use multiple worker threads (threads for short) to make simultaneous Web server requests. By using multiple threads per test system, it is possible to generate a large enough load on a Web server to stress it to its limit with a reasonable number of test systems. The total number of threads that make requests to a server provides a better way to compare the performance of different servers under load than the number of test systems. That is why our graphs show the number of test threads for each data point as well as the number of test systems.
Because of how it works, WebBench is at its best making peak performance measurements that show the limitations of a Web server platform. What Are the Bottlenecks?The readily measured factors that limit performance of a Web server are:
We will examine each factor individually. Performance Monitoring ToolsWe ran the standard Windows NT performance-monitoring tool,perfmon, on the ProLiant 3000 during the tests to gather performance statistics. Perfmon allows you to select which performance statistics you want to monitor and lets you see them in a real-time chart as well as save them in a log file for later analysis. We logged the processor, memory, network interface, and disk subsystem performance counters for these tests. To collect performance data on the Ultra Enterprise 450 during the test, we ran vmstat for memory statistics and mpstat for processor-related statistics. These programs output a fixed set of performance statistics that can be displayed or saved in a file. Server CPU PerformanceFor the ISAPI test, both of the ProLiant 3000 CPUs were 97.5% CPU utilized at peak performance. We could not increase the ProLiants CPU utilization by increasing the number of test threads (the test systems were only about 20% utilized at the servers peak performance, so we could expect them to increase the server load by increasing the number of test threads). While the ProLiants CPUs were heavily used, they could have done more. Well look for other factors besides the CPU that limited ProLiant 3000s performance. For the ASP test on the ProLiant 3000, each CPU was 99% utilized at peak performance. The CPUs spent 54% in Privileged Time and 45% in User Time. Because the ProLiant 3000s CPUs were essentially fully utilized, they did contribute to limit the performance of the system. However, in the Operating System and Web Server Software Performance section below, we will look more closely at how the CPUs were used to understand better the effect of ASPs. At peak performance, mpstat reported the Ultra Enterprise 450 had one CPU 95% utilized while the other was 90% utilized. We could not get the Ultra Enterprise 450 to balance the load on its CPUs better. Also, we could not get higher performance by increasing the number of threads on the test systems. We conclude that the Ultra Enterprise 450 was performance-limited by a factor other than its CPUs. Memory was not a performance limitation for either system during any test as shown by monitoring programs. The ProLiant 3000 used about 50 MB of memory for both the ISAPI and ASP test. The Ultra Enterprise 450 used about 90 MB of memory. Both systems had 512 MB of memory. Disk Subsystem PerformanceThe disk activity for the ProLiant 3000 was moderate after the WebBench warm-up period with about 25 disk accesses per second for both the ISAPI and ASP tests. The disk subsystem was not a performance-limiting factor. The Ultra Enterprises disk subsystem showed activity comparable to that of the ProLiant. Its disk subsystem was not a performance-limiting factor. Network PerformanceFor the ISAPI test, perfmon showed one network interface card on the ProLiant 3000 used 73.3 Mbits/second of bandwidth and the other used 75.8 Mbits/second at the peak performance point. This is over 73% and 75% of the available bandwidth on each 100Base-TX network and indicates a saturated network. Because of this high network bandwidth utilization, we were not able to increase the ProLiant 3000s CPU utilization closer to 100%. So the networks contributed to limit the ISAPI performance of the ProLiant 3000. For the ASP test, each of the ProLiant 3000s network interface cards used about 38 Mbits/second of bandwidth at peak performance. This is less than 40% of the available bandwidth on each network and did not limit the ASP performance. The bandwidth used on each of the two networks on the Ultra Enterprise 450 peaked at 19 Mbits/second. Since this is less than 20% of the available bandwidth on a 100Base-TX network, the networks were not a performance limitation for the Ultra Enterprise 450. Operating System and Web Server Software PerformanceWindows NT Server 4.0 on the ProLiant 3000 is able to handle both static and dynamic HTTP requests quickly. Using ISAPI, it responded to more than 400 dynamic requests/second. Because ISAPI programs run in the same process as IIS, Windows NT Server spends most of its time servicing I/O requests rather than managing context switches between its Web services and the dynamic request-handling program. This shows up in the CPU utilization: both CPUs spent over three times as much time in Privileged Time as in User Time (75% vs. 22%). Windows NT Server was not the ISAPI performance bottlenecks for the ProLiant 3000. The ASP performance is lower than the ISAPI performance because there is more overhead involved in processing an ASP request. For each ASP request, IIS retrieves the file containing the ASP script, parses it, and executes it. This means that more CPU time is spent in User Time than for a comparable request using only an ISAPI program. That is why the total CPU utilization is made up of 45% User Time and 54% Privileged Time for the ASP test while for the ISAPI test it is 22% User Time and 75% Privileged Time. Solaris 2.6 performance was clearly hampered by the inability of SWS to handle dynamic HTTP requests quickly. We did test runs using only static HTTP 1.0 requests and obtained 1149 RPS less than the 1337 RPS IIS did with both static and dynamic requests. SWS slowed by about a factor of three for these tests because it had to create a new CGI process for each dynamic request. This burden showed up in the mpstat statistics with over 37,000 context switches per second at peak performance. Based on test runs we did on the ProLiant 3000 using only HTTP 1.0 static requests without HTTP 1.0 keep-alives, we estimate that had HTTP 1.1 persistent connections been supported, the Ultra Enterprise 450 would have performed about 5% to 10% better than it did. We conclude that the lack of an efficient Web application environment limited the performance of Solaris 2.6. ConclusionWindows NT Server 4.0 on a Compaq ProLiant 3000 provides a high-performance platform for heavily used Web sites. This platform handles a mix of static and dynamic HTTP requests faster than Solaris 2.6 with SWS 1.0 on a Sun Ultra Enterprise 450 handles only static requests. In order to keep a ProLiant 3000 working at its peak, you need to provide over 200 Mbits/second of 100Base-TX bandwidth. The Web-server performance of Solaris 2.6 with SWS 1.0 on a Sun Ultra Enterprise 450 is limited, making this platform inappropriate for high-volume Web sites with dynamic content. Price/PerformanceWe calculated price/performance by dividing the street price of the servers and software tested by the peak requests per second. We obtained the street price of the ProLiant 3000 configuration shown in Table 3 by requesting a quote from a value-added reseller. The street price of the Ultra Enterprise 450 in Table 4 also was obtained from a VAR quote. We did not include sales tax because it varies greatly from locality to locality. Table 3: Compaq ProLiant 3000 Pricing
Table 4: Sun Ultra Enterprise 450 Pricing
Products TestedConfigurations and TuningThe purpose of this benchmark was to find the maximum performance of the Web servers tested. As discussed above, the capabilities of Windows NT Server 4.0 Web services and Solaris 2.6 with SWS differ and had a significant effect on the performance we measured. Table 5 highlights the relevant features that affected this benchmark. Table 5: Web Server Capabilities
We configured each system to perform the best that it could. Table 6 shows the configuration of the Compaq ProLiant 3000 we tested. Table 7 describes the Sun Ultra Enterprise 450 configuration we used. Table 6: Compaq ProLiant 3000 Configuration
Table 7: Sun Ultra Enterprise 450 Configuration
Test LabThe Test Systems and Network ConfigurationsMindcraft ran these tests using a total of 24 test systems consisting of 12 each of two different types. Table 8 and Table 9 show the configurations of the two types of test systems. Table 8: Type A Test Systems
Table 9: Type B Test Systems
The test systems were on two dedicated 100Base-TX Ethernets using four eight-port hubs. We balanced the two networks by putting six of each type of test system on each network. Figure 3 shows the test lab configuration. Figure 3: Test Lab Configuration Mindcraft CertificationMindcraft, Inc. conducted the performance tests described in this report between April 14 and May 20, 1998, in our laboratory in Palo Alto, California. Mindcraft used the WebBench 2.0 benchmark to measure performance with a 70% static and 30% dynamic workload described in The Benchmark section above. Mindcraft certifies that the results reported herein represent the performance of Microsoft Windows NT Server 4.0 on a Compaq ProLiant 3000 computer as measured by WebBench 2.0. Mindcraft also certifies that the results reported herein represent the performance of Sun Solaris 2.6 with Sun Web Server 1.0 on a Sun Ultra Enterprise 450 computer as measured by WebBench 2.0. Our test results should be reproducible by others who use the same test lab configuration as well as the computer and software configurations and modifications documented in this report. Overall WebBench ResultsCompaq ProLiant 3000 ISAPI Results
Compaq ProLiant 3000 ASP Results
Sun Ultra Enterprise 450 CGI Results
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
NOTICE: The information in this publication is subject to change without notice. MINDCRAFT, INC. SHALL NOT BE LIABLE FOR ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. This publication does not constitute an endorsement of the product or products that were tested. This test is not a determination of product quality or correctness, nor does it ensure compliance with any federal, state or local requirements. The Mindcraft tests discussed herein were performed without independent verification by Ziff-Davis and Ziff-Davis makes no representations or warranties as to the results of the tests. Mindcraft is a registered trademark of Mindcraft, Inc. Product and corporate names mentioned herein are trademarks and/or registered trademarks of their respective companies. |
![]() |
Copyright © 1997-98. Mindcraft, Inc. All rights reserved. Mindcraft is a registered trademark of Mindcraft, Inc. For more information, contact us at: info@mindcraft.com Phone: +1 (408) 395-2404 Fax: +1 (408) 395-6324 |