Web Server White Paper: Microsoft Windows NT Server 4.0 and Sun Solaris 2.6

Mindcraft tested two Web servers to determine the maximum performance possible from each:

Table 1 shows the peak performance we measured for each Web server in HTTP requests per second (RPS), the peak throughput in megabytes per second, the price of each system, and the price/performance in dollars per RPS.

Table 1: Performance Summary
(smaller numbers are better for System Price and Price/Performance, larger numbers are better for the others)

We tested these systems with Ziff-Davis Benchmark Operation’s WebBench 2.0. We created a workload to model published research on Web usage patterns. This workload also includes a significant amount of dynamically generated Web pages. See The Benchmark section for more details.

The benchmark results clearly show that a Windows NT Server 4.0-ProLiant 3000 platform is a more cost-effective, higher performance Web server solution than a Solaris 2.6-SWS 1.0-Ultra Enterprise 450 platform. The Windows NT Server platform using ISAPI to respond to dynamic Web requests is four times faster than the Solaris-SWS platform and has 10.3 times better price/performance. Furthermore, the Windows NT Server platform using ASPs is 2.1 times faster and has 5.4 times better price/performance than the Solaris-SWS platform.

The Benchmark

Mindcraft used WebBench 2.0 with a workload we created that models the Web server workload characterization published in a paper by Martin F. Arlitt and Carey L. Williamson of the University of Saskatchewan and later corroborated by Jakob Nielsen's analysis of the file access pattern at Sun's Web site. We chose WebBench 2.0 as the scaffold to run the workload because it allows you to build custom workloads easily, it generates detailed result reports with graphs, and it has an easy-to-use graphical interface for controlling the benchmark process.

The Workload

Arlitt and Williamson analyzed Web server logs from six different sites: three from academic sites, two from scientific research sites, and one from a commercial Internet service provider. One characteristic common to all sites analyzed is what Arlitt and Williamson call "concentration of references," also commonly referred to as "file access pattern" or "file access frequency." They report finding that "10% of the files accessed account for 90% of server requests and 90% of the bytes transferred." At first, this finding may seem startling. However, look at how you use a Web site. Typically, you'll start at the home page of a site then click on a link you are interested in to get to the next page. You usually only visit a small fraction of the pages at a typical Web site.

This kind of access pattern follows the Zipf distribution and occurs in many places besides the Internet. For example, new releases of popular movies on video tape account for a much higher percentage of rentals than old classics. Similarly, books on popular "best seller" lists account for a much higher percentage of books borrowed from libraries than other books.

We took the following steps to create a WebBench 2.0 workload based on a Zipf distribution file access pattern:

With this technique, we were able to generate a WebBench workload file. WebBench 2.0 supports up to 100 classes or groups of URLs. Because each class can hold up to 100 URLs and because the Zipf distribution creates a file access pattern with a very large number of files having an extremely small number of accesses, we could not use all of the 6400 unique URLs that were in the log file. We did use 704 URLs that simulated 94% of the file accesses captured in the log file. Each of the URL requests that we could not simulate in the workload amounted to less than 0.001% of the total requests. The average file size, based on the log file analysis, was just over 14,700 bytes.

Dynamic Requests

Based on discussions we had with several Web server developers and computer vendors as well as log file analysis of some very popular commercial Web sites, we decided to make 30% of the HTTP requests "dynamic." A dynamic request is one that a Web server passes to a program, which then provides the response. We used two types of dynamic requests:

We wrote two dynamic request-processing programs for Windows NT Server IIS Web services: an ISAPI module and an ASP that called the ISAPI module. ISAPI modules run in the same process as IIS, so handling a dynamic request does not need a process context switch. ASPs are HTML pages that have an embedded program in a scripting language; in this case we used VB Script.

SWS 1.0 does not provide a programming interface equivalent to ISAPI or ASP, it only allows for CGI programs. So we wrote a CGI version of the dynamic request-processing program for the benchmark. This meant that each dynamic request would pay the performance penalty of launching a new process.

WebBench 2.0 Configuration

Because the purpose of this test was to obtain the maximum performance possible, we set up each operating system, each software Web server, each dynamic request-handling program, and the WebBench 2.0 test systems accordingly.

Table 2 shows the key WebBench 2.0 configuration parameters we used. The persistent connection percentage was based on information exchanged in the above referenced discussions. The number of requests per persistent connection typically will follow some type of distribution. However, because WebBench 2.0 randomly selects values to use in a specified range, we decided to force this parameter to be 4, the average agreed by the companies in our discussions.

We wanted to use HTTP 1.1 as the protocol between WebBench and the Web servers. However, we were unable to use HTTP 1.1 persistent connections for SWS 1.0 because WebBench returned an error indicating the feature was not supported. We could not find an SWS 1.0 configuration parameter to turn persistent connections on. We did leave the SWS “keepalive_enable” configuration parameter at its default setting of “yes.” So we had to revert to HTTP 1.0. Because WebBench 2.0 will make all requests on one connection if HTTP 1.0 keep-alives are specified, we did not use them for SWS. Based on tests we did on Windows NT Server without persistent connections or keep-alives, we estimate that not being able to use them for SWS 1.0 lowered its measured performance by 10% or less.

We set the WebBench configuration parameter “number of worker threads per client” to the values shown in Table 2 in order to get the best results for each system. In trial runs, we found that setting other values for “number of worker threads per client” resulted in lower peak performance. It is the total number of worker threads issuing requests that determines the overall load on a Web server, not the number of test systems.

Performance Analysis

Looking at the Results

Figure 1 shows the total number of requests per second for both Windows NT Server (WNTS in the figure) and Solaris-SWS (Solaris in the figure). The x-axis shows the total number of test threads used at each data point; a higher number of threads indicates a larger load on the server. The number of test threads is different for Windows NT Server and Solaris-SWS because we were able to obtain the best performance for each platform with the number of threads shown. Figure 2 presents the throughput for each platform.

In order to understand what the WebBench measurements mean you need to know how WebBench 2.0 works. It stresses a Web server by using a number of test systems to request URLs. Each WebBench test system can be configured to use multiple worker threads (threads for short) to make simultaneous Web server requests. By using multiple threads per test system, it is possible to generate a large enough load on a Web server to stress it to its limit with a reasonable number of test systems. The total number of threads that make requests to a server provides a better way to compare the performance of different servers under load than the number of test systems. That is why our graphs show the number of test threads for each data point as well as the number of test systems.

Because of how it works, WebBench is at its best making peak performance measurements that show the limitations of a Web server platform.

What Are the Bottlenecks?

Performance Monitoring Tools

We ran the standard Windows NT performance-monitoring tool,perfmon, on the ProLiant 3000 during the tests to gather performance statistics. Perfmon allows you to select which performance statistics you want to monitor and lets you see them in a real-time chart as well as save them in a log file for later analysis. We logged the processor, memory, network interface, and disk subsystem performance counters for these tests.

To collect performance data on the Ultra Enterprise 450 during the test, we ran vmstat for memory statistics and mpstat for processor-related statistics. These programs output a fixed set of performance statistics that can be displayed or saved in a file.

Server CPU Performance

For the ISAPI test, both of the ProLiant 3000 CPUs were 97.5% CPU utilized at peak performance. We could not increase the ProLiant’s CPU utilization by increasing the number of test threads (the test systems were only about 20% utilized at the server’s peak performance, so we could expect them to increase the server load by increasing the number of test threads). While the ProLiant’s CPUs were heavily used, they could have done more. We’ll look for other factors besides the CPU that limited ProLiant 3000’s performance.

For the ASP test on the ProLiant 3000, each CPU was 99% utilized at peak performance. The CPUs spent 54% in Privileged Time and 45% in User Time. Because the ProLiant 3000’s CPUs were essentially fully utilized, they did contribute to limit the performance of the system. However, in the Operating System and Web Server Software Performance section below, we will look more closely at how the CPUs were used to understand better the effect of ASPs.

At peak performance, mpstat reported the Ultra Enterprise 450 had one CPU 95% utilized while the other was 90% utilized. We could not get the Ultra Enterprise 450 to balance the load on its CPUs better. Also, we could not get higher performance by increasing the number of threads on the test systems. We conclude that the Ultra Enterprise 450 was performance-limited by a factor other than its CPUs.

Memory was not a performance limitation for either system during any test as shown by monitoring programs. The ProLiant 3000 used about 50 MB of memory for both the ISAPI and ASP test. The Ultra Enterprise 450 used about 90 MB of memory. Both systems had 512 MB of memory.

Disk Subsystem Performance

The disk activity for the ProLiant 3000 was moderate after the WebBench warm-up period with about 25 disk accesses per second for both the ISAPI and ASP tests. The disk subsystem was not a performance-limiting factor.

The Ultra Enterprise’s disk subsystem showed activity comparable to that of the ProLiant. Its disk subsystem was not a performance-limiting factor.

Network Performance

For the ISAPI test, perfmon showed one network interface card on the ProLiant 3000 used 73.3 Mbits/second of bandwidth and the other used 75.8 Mbits/second at the peak performance point. This is over 73% and 75% of the available bandwidth on each 100Base-TX network and indicates a saturated network. Because of this high network bandwidth utilization, we were not able to increase the ProLiant 3000’s CPU utilization closer to 100%. So the networks contributed to limit the ISAPI performance of the ProLiant 3000.

For the ASP test, each of the ProLiant 3000’s network interface cards used about 38 Mbits/second of bandwidth at peak performance. This is less than 40% of the available bandwidth on each network and did not limit the ASP performance.

The bandwidth used on each of the two networks on the Ultra Enterprise 450 peaked at 19 Mbits/second. Since this is less than 20% of the available bandwidth on a 100Base-TX network, the networks were not a performance limitation for the Ultra Enterprise 450.

Operating System and Web Server Software Performance

Windows NT Server 4.0 on the ProLiant 3000 is able to handle both static and dynamic HTTP requests quickly. Using ISAPI, it responded to more than 400 dynamic requests/second. Because ISAPI programs run in the same process as IIS, Windows NT Server spends most of its time servicing I/O requests rather than managing context switches between its Web services and the dynamic request-handling program. This shows up in the CPU utilization: both CPUs spent over three times as much time in Privileged Time as in User Time (75% vs. 22%). Windows NT Server was not the ISAPI performance bottlenecks for the ProLiant 3000.

The ASP performance is lower than the ISAPI performance because there is more overhead involved in processing an ASP request. For each ASP request, IIS retrieves the file containing the ASP script, parses it, and executes it. This means that more CPU time is spent in User Time than for a comparable request using only an ISAPI program. That is why the total CPU utilization is made up of 45% User Time and 54% Privileged Time for the ASP test while for the ISAPI test it is 22% User Time and 75% Privileged Time.

Solaris 2.6 performance was clearly hampered by the inability of SWS to handle dynamic HTTP requests quickly. We did test runs using only static HTTP 1.0 requests and obtained 1149 RPS — less than the 1337 RPS IIS did with both static and dynamic requests. SWS slowed by about a factor of three for these tests because it had to create a new CGI process for each dynamic request. This burden showed up in the mpstat statistics with over 37,000 context switches per second at peak performance. Based on test runs we did on the ProLiant 3000 using only HTTP 1.0 static requests without HTTP 1.0 keep-alives, we estimate that had HTTP 1.1 persistent connections been supported, the Ultra Enterprise 450 would have performed about 5% to 10% better than it did. We conclude that the lack of an efficient Web application environment limited the performance of Solaris 2.6.

Conclusion

Windows NT Server 4.0 on a Compaq ProLiant 3000 provides a high-performance platform for heavily used Web sites. This platform handles a mix of static and dynamic HTTP requests faster than Solaris 2.6 with SWS 1.0 on a Sun Ultra Enterprise 450 handles only static requests. In order to keep a ProLiant 3000 working at its peak, you need to provide over 200 Mbits/second of 100Base-TX bandwidth.

The Web-server performance of Solaris 2.6 with SWS 1.0 on a Sun Ultra Enterprise 450 is limited, making this platform inappropriate for high-volume Web sites with dynamic content.

Price/Performance

We calculated price/performance by dividing the street price of the servers and software tested by the peak requests per second. We obtained the street price of the ProLiant 3000 configuration shown in Table 3 by requesting a quote from a value-added reseller. The street price of the Ultra Enterprise 450 in Table 4 also was obtained from a VAR quote. We did not include sales tax because it varies greatly from locality to locality.

Feature	Configuration	Price
Base System	A25-AA-UEC2-9S-512CD-5214x3-C: 2 x 300 MHz UltraSPARC II; 512 MB RAM; 3 x 4.2 GB UltraSCSI disk; 1 x 10/100Base-TX NIC; Solaris 2.6 server license for unlimited users	$29,937
Network	1 x SunSwift 10/100Base-TX	$739
Operating System	SOLS-C Solaris 2.6 Server CD-ROM Media Kit	$75
Monitor	X3660/7103A: 17" Monitor; PGX Graphics PCI card, and keyboard	$1,077
Total		$31,828

Products Tested

Configurations and Tuning

The purpose of this benchmark was to find the maximum performance of the Web servers tested. As discussed above, the capabilities of Windows NT Server 4.0 Web services and Solaris 2.6 with SWS differ and had a significant effect on the performance we measured. Table 5 highlights the relevant features that affected this benchmark.

We configured each system to perform the best that it could. Table 6 shows the configuration of the Compaq ProLiant 3000 we tested. Table 7 describes the Sun Ultra Enterprise 450 configuration we used.

Feature	Configuration
CPU	2 x 333 MHz Pentium II Cache: L1: 32 KB (16 KB I + 16 KB D); L2: 512 KB
RAM	512 MB EDO ECC
Disk	3 x 4.3 GB Ultra Wide SCSI-3; OS on C: drive; Web data files on D: drive; Web server logs on E: drive; used both on-board disk controllers with the D: drive on one controller and the C: and E: drives on the other.
Networks	2 Total: 2 Netelligent 100Base-TX Network Interface Cards
Operating System	Windows NT Server 4.0; Service Pack 3; Windows NT Option Pack; applied the following updates: w3svc (version 4.02.0636); 12/97 TCP/IP & AFD Transport hotfix (ftp://ftp.microsoft.com/bussys/winnt/winn-public/fixes/usa/NT40/hotfixes-postSP3); IIS 4.0 logfix 4/98 (ftp://ftp.microsoft.com/bussys/winnt/winnt-public/fixes/usa/NT40/hotfixes-postSP3); NDIS hotfix 12/97 (ftp://ftp.microsoft.com/bussys/winnt/winnt-public/fixes/usa/NT40/hotfixes-postSP3)
Tuning	Performance set to maximize file sharing; foreground application boost set to NONE; ran chkdsk /l:65536 on all disks to increase the log size. The following registry settings in HKEY_LOCAL_MACHINE\|SYSTEM\|CurrentControlSet\|Services : Tcpip\|Parameters\|MaxFreeTcbs 72000 Tcpip\|Parameters\|MaxHashTableSize 65536 Tcpip\|Parameters\|TcpTimedWaitDelay 60 Tcpip\|Parameters\|TcpWindowSize 17520
Web Server	Internet Information Server 4.0
Tuning	Performance set to handle over 100,000 hits per day; removed all mappings except asp. The following registry settings in HKEY_LOCAL_MACHINE\|SYSTEM\|CurrentControlSet\|Services : InetInfo\|Parameters\|ListenBackLog 250 InetInfo\|Parameters\|MaxPoolThreads 2 InetInfo\|Parameters\|MemoryCacheSize 4194304 InetInfo\|Parameters\|ObjectCacheTTL 0xffffffff InetInfo\|Parameters\|OpenFileInCache 7000 InetInfo\|Parameters\|PoolThreadLimit 2

Feature	Configuration
CPU	2 x 296 MHz Ultra SPARC Cache: L1: 32 KB (16 KB I + 16 KB D); L2: 2 MB
RAM	512 MB ECC RAM
Disk	3 x 4.2 GB (7200 RPM); OS on drive 0; Web data files on drive 1; Web server logs on drive 2
Networks	2 Total: 1 Built-in FastEthernet, 1 SunSwift FastEthernet Card
Operating System	Solaris 2.6
Tuning	tcp_close_wait_interval = 60000 tcp_conn_hash_size = 262144 From /etc/system: rlim_fd_max = 4096 tcp_conn_hash_size=262144
Web Server	Sun Web Server 1.0
Tuning	From http.conf: cache_small_file_cache_size 24 # megabytes cache_large_file_cache_size 768 # megabytes cache_verification_time 1000 # seconds threads_n_active 512 log_max_file_size 1048576000 # bytes i.e. 1 GB

Test Lab

The Test Systems and Network Configurations

Mindcraft ran these tests using a total of 24 test systems consisting of 12 each of two different types. Table 8 and Table 9 show the configurations of the two types of test systems.

The test systems were on two dedicated 100Base-TX Ethernets using four eight-port hubs. We balanced the two networks by putting six of each type of test system on each network. Figure 3 shows the test lab configuration.

Mindcraft Certification

Mindcraft, Inc. conducted the performance tests described in this report between April 14 and May 20, 1998, in our laboratory in Palo Alto, California. Mindcraft used the WebBench 2.0 benchmark to measure performance with a 70% static and 30% dynamic workload described in The Benchmark section above.

Mindcraft certifies that the results reported herein represent the performance of Microsoft Windows NT Server 4.0 on a Compaq ProLiant 3000 computer as measured by WebBench 2.0. Mindcraft also certifies that the results reported herein represent the performance of Sun Solaris 2.6 with Sun Web Server 1.0 on a Sun Ultra Enterprise 450 computer as measured by WebBench 2.0.

Our test results should be reproducible by others who use the same test lab configuration as well as the computer and software configurations and modifications documented in this report.

Web Server Comparison:

Microsoft Windows NT Server 4.0
on a Compaq ProLiant 3000
and
Sun Solaris 2.6 with Sun WebServer 1.0
on a Sun Ultra Enterprise 450

Contents

Executive Summary

Windows NT Server 4.0 Is Four Times Faster as a Web Server Than Solaris 2.6 with Sun Web Server 1.0 and Has 10.3 Times Better Price/Performance

The Benchmark

The Workload

Dynamic Requests

WebBench 2.0 Configuration

Performance Analysis

Looking at the Results

What Are the Bottlenecks?

Performance Monitoring Tools

Server CPU Performance

Disk Subsystem Performance

Network Performance

Operating System and Web Server Software Performance

Conclusion

Price/Performance

Products Tested

Configurations and Tuning

Test Lab

The Test Systems and Network Configurations

Mindcraft Certification

Overall WebBench Results

Compaq ProLiant 3000 ISAPI Results

Compaq ProLiant 3000 ASP Results

Sun Ultra Enterprise 450 CGI Results

Web Server Comparison:

Microsoft Windows NT Server 4.0 on a Compaq ProLiant 3000 and Sun Solaris 2.6 with Sun WebServer 1.0 on a Sun Ultra Enterprise 450

Contents

Windows NT Server 4.0 Is Four Times Faster as a Web Server Than Solaris 2.6 with Sun Web Server 1.0 and Has 10.3 Times Better Price/Performance

The Workload

Dynamic Requests

WebBench 2.0 Configuration

Looking at the Results

Performance Monitoring Tools

Server CPU Performance

Disk Subsystem Performance

Network Performance

Operating System and Web Server Software Performance

Conclusion

Configurations and Tuning

The Test Systems and Network Configurations

Compaq ProLiant 3000 ISAPI Results

Microsoft Windows NT Server 4.0
on a Compaq ProLiant 3000
and
Sun Solaris 2.6 with Sun WebServer 1.0
on a Sun Ultra Enterprise 450