CS 252 Project:
Characterization of the "Hollow Client" Processor in a Distributed Design Environment
Naji Ghazal [ naji@eecs.berkeley.edu ] and
Wing Leung [ wleung@ic.eecs.berkeley.edu ]
[ Abstract | Table of Content | Content ]
1.0 Introduction
In today's world of design, there are growing needs that have risen out of the recent explosion of Internet use throughout the world. As the world grows closer together, it becomes necessary to provide for wide area distribution of design activity, animation and direct manipulation of design data, and distributed and dynamic application deployment. In the effort to meet these needs, a new Web-based system architecture (see Figure 1) for computing has been proposed[1] as follows: provide the user direct access to the wealth of resources and services throughout the Net, using the all-ubiquitous World-Wide-Web, and exploit the computing power of dedicated servers available, to alleviate the user's computing and memory requirements. The user can then carry a "hollow client[2]" machine, hollow of any major tools and functions permanently residing with it, and "caches" in the applications on demand. The client machine runs locally the portions of the applications needed to interact and communicate with the application servers, in order to provide satisfactory real-time performance for the user.

FIGURE 1. Distributed Design System: the "Hollow Client" Approach.
In this new distributed environment, the means of achieving high performance for the user will change from those currently used for "centralized" uniprocessor systems, as the bottlenecks in the system will be different. The interaction between the client machine and its data will be across wide-area networks, as opposed to local networks or LANs, and the most frequent tasks performed locally on it will change. There are several aspects of the system that can be examined to improve performance:
- Improving the network bandwidth and latency:
A problem that will be dealt with on a national or global scale.
- Determining the appropriate partition of tasks:
Finding which processes to migrate to the client and which to keep on the servers for a given application is critical to performance, and it must be adaptable to the computing power of the client (and the servers), and the network bandwidth.
- Appropriate caching on the client's memory system:
Data behavior and lifetimes on the client have to be examined to optimize the client's operating system caching policies, as well as memory system sizes and configurations.
- Appropriate communication protocols between the client and server sides:
Applications must be written with the network in mind. The client side of the application must communicate with the server side in such a way that the dependence on the network bandwidth is minimized.
- Optimizing the execution of the hollow client's most frequent tasks:
The set of operatoins to become most commonly executed on the client must be examined, and the client processor architecture can be optimized toward them.
Our area of interest for this project lies within the client processor. We believe the set of operations that must be performed locally on even the "lightest" hollow client will be user-interactive, response-time-critical functions, such as data displaying, editing, and navigating. This becomes even more necessary as the World-Wide-Web will continue to carry more complex forms of communications, such as hyper-media, multi-media and 3D graphics. Response time for such operations will set a lower bound on the performance requirements of the hollow client, and its feasiblity as a "network computer," even with ideal network and memory systems. We can quantify the performance requirements by examining the computing involved in carrying out these common operations, using current design and network-based applications on common scientific workstations.
In this project, we traced a number of graphical editing/displaying applications that represent some of tomorrow's interactive display/edit portions of distributed applications. We determined the number of instructions and execution time involved in completing each of these operations. Given our speculations about tomorrow's typical microprocessor architecture, we derived a lower bound on the rate of execution (MIPS) required for the hollow client. We also explored the dynamic instruction mixes and searched for special optimization features to which they might lend themselves. In addition, as many of these operations involve network access, we examined the conditions required to minimize the impact of the network, whereby the above MIPS lower bound is the dominant factor in the ability of the client processor to perform the common operations in real-time. We estimated the ranges of average transfer data sizes and tranfser (comunication) frequencies over the Net that would allow real-time response at the client, given sufficient client processor performance.
In the sections that follow, we describe the work on which we based our study, and the approach we took to find our results. We then provide validation for our methodology, present the results, followed by our conclusions and thoughts on future work to be done.
2.0 Related Work
Our research group [1], headed by Prof. Richard Newton, has begun working on a system for distributed design integration, using the "hollow client" approach. Studies [2] have shown that compute-intensive jobs such as compilations and simulations can be carried out with more easily attainable high performance if run on dedicated servers than if run on a client machine. In looking at developments in the use of servers for computing, we have seen that the emphasis so far has been on systems spanning local-area networks. In that domain, a promising solution is the X Terminal[3]. Unfortunately, the communication protocol that the X Terminal uses with its servers is not suited for wide-area networks, as it is done in a too low level, causing forbidingly excessive network traffic during program execution. This left us with no previous studies done on the kind of computer we see in the "hollow client."
Next, we explored the performance issues related to the World-Wide-Web[4], for it is seen as the medium through which the hollow client will interact with its applications and data, for its ubiquity, flexibility and allowance for platform independence. We have found that the bottleneck is in the Network bandwidth as expected. We also saw that an effective way in improving performance with a Web server is to limit simultaneous client requests, which supports our rejection of the X terminal. We also leanred that web server performance is highly sensitive to average file size transfered. In addition, a study on the HTTP network transfer protocol for the Web[5] showed us that its current version is inefficient, as it uses excessive network interaction for bring up a web page with images or other media in it. As a result, all indications were favoring the inclusion of any real-time functions of an application in the set of operations run locally on the hollow client.
3.0 Technical Approach
In order to determine the performance required to obtain a real-time performance for displaying, editing, and network interaction on the hollow client, we traced such operations using current applications run on several machines, and gathered dynamic instruction stream data (instruction mix, instruction count, execution time). We chose one of the machines to be a low-end workstation that we felt fitting to represent a minimum acceptible real-time performance machine. From the traces, we derived a minimum rate of execution the hollow client machine would need to maintain if it is to provide the user with satisfactory response time, assuming no further constraints imposed by the memory system or the network bandwidth. To validate such a result, we characterized the interaction of the client with the network in terms of its average tranfser data size and average frequency of network transfer. From that we found the conditions that would help minimize the impact of the network behavior on execution time of a typical network access operation. Namely, we obtained an estimate for the bounds on those parameters (size and frequency of transfer) that would cause the time to execute a network access and, say, display the data, on the client to dominate the total time to complete the network transaction (as opposed to the network latency dominating).
3.1 Finding the right computing resourses and tracing tools
Having determined that we need to obtain dynamic instruction stream information on a number of applications that involve editing and displaying in graphical and hyper-text forms, and a number of X Windows applications, we investigated a suite of tracing tools[6]. We examined Qpt, ATOM, Shade, Pixie, and SimOs [7](a new powerful sparc simulator with access mainly limited to the Stanford Architecture graduate student group and Sun Microsystems). After much learning about the tools, we were able to make use of Pixie[8] and some of Shade. From these two tools, which instrument a binary executable and allow you to run them as the original, we were able to obtain instruction and cycle counts and execution time for the instrumented. We also to profiled the applications (disassemble the instruction stream) and examined the instruction content of the most frequently run procedures of an application.
To give our traces more generality, we traced our applications on multiple RISC workstations in our CAD group cluster: a dec alpha, two dec mips, sgi mips, and a sun sparc, see Appendix A for architecutural information on the machines. For the former three types of machines, we used Pixie, and for the Sun, a program called "ifreq" from Shade was the only one that gave the traces we needed.
3.2 Selecting the Hollow Client common operations to trace
From Section 2.0, we have shown that the set of functions that the hollow client will most likely run locally are display, edit, and browse (over the network) operations. We wanted to capture what we saw as tomorrow's common design entry/editing environment, so we chose a number of graphical tool editing actions. We also wanted to trace Internet browsing, which we see as the most common way of document publishing and communication to be used in the future (as is already taking place). The set of such operations we chose to test is:
- Editing: drawing lines, placing objects, moving objects, poping up or closing windows, and 2-dimensional panning or zooming.
- Immersive Navigating: panning/zooming (3D rendering) and animations.
- Input/Output (Network Access): browsing documents and loading/saving of data.
3.3 Tracing the Hollow Client operations
For our set of chosen operations, we wanted to trace multiple applications for each in order to minimize the dependence on the specific algorithms used to perform the desired actions. We also ran them all on multiple machines, whenever possible, to minimize the dependence on the specific instruction set of a certain machine.
3.3.1 Tracing graphical and 3D tools and the X server
For graphical tools, we consulted with a previous CS252 project paper on Graphics bechmarks[9] to learn about their content and their experience with the X server. Having realized of the heavy role it plays in running X applications, although it was very difficult, we eventually succeeded in including the X server in all the traces on one of the machines (the Dec MIPS) (thanks to our system administrator Judd reiffin). For traces on other machines, we estimated the effect of the X server from its behavior on that Dec MIPS. For editing, we chose: Xfig--a drawing tool, StateCharts Editor[10]--an FSM editor in Tcl-Tk, and Magic Physical Layout Editor(although only for a limited number of traces). We were also fortunate to enable the instrumenting of our group's 3D VRML Browser "InteractND"[11], to trace immersive navigations, 3D renderings, and animations, as we forsee them as common means of communication in the future.
3.3.2 Tracing network access and exploring the impact of the network
Finally, for network access operations, we needed an Internet browser such as Netscape to capture the network interaction and Web browsing. We were able to instrument Mosaic after obtaining its source code and compiling it on the different platforms.
To characterize the impact of the network on those operations that involved network access, we wrote a small client/server program (pair of programs) that mimic what we see as a suitable data transfer protocol. Eventhough our traces from Mosaic gave us information on loading up data from the nework and displaying it, we were apprehensive about restricting our network access tracing to a tool that uses an old vesion of the HTTP protocol, which is supposedly wasteful in network traffic. We also wanted to define a region in which our average transfer data size used (representing html pages or java applet sizes), and average frequency of accessing the network allowed by the client processor to remain busy enough doing local processing and not waiting on the network, and hence working in the region where the MIPS lower bound on the processor is the govering factor in the feasibility of the client processor. We did so by using the client/server program to repeatedly send/receive a certain size file, using a range of file sizes, and adding a variable number of register-register instructions in between each network access. For each iteration, we measured the instruction count as well as the execution time.
3.3.3 Isolating the operations' trace information from the applications traced
The challenge here was to isolate trace information that involve only the specific desired operation from the complete application execution and make sure that the traces were not sensitive to interactive nature of the tools (variable human think time, etc.). This involved performing sufficiently large number of iterations (100-200) to reduce the application startup portion to < 5% of total execution time, the number of system calls to <2%, and the effect of variable mouse movement to < 1%.
3.4 Evaluating the Hollow Client traces
For each operation traced, we obtained its dynamic instruction mix (frequencies of ALU instructions, loads, stores, branches, floating point instructions, nops, and system calls), its total instruction and cycle counts, and its execution time. We also disassembled a listing of the most frequently run procedures to examine their instruction content. The underlying rule we accepted here is that real-time performance tested on the lowest performance workstation is satisfactory, and we used that worst case measure as an upper bound on execution time to be spent for any of the hollow client operations.
3.4.1 Determining the lower bound on the hollow client performance
To determine the computing power requirement on the hollow client, we used the numbers for the most taxing operations (the longest execution time on the low end workstation, with the largest number of instructions taken). With this information, we calculated the MIPS measured and converted it to fit our model of the hollow client's processor architecture. The model we used assumes that the near future's typical processor will have instruction set and processor architectures similar to those of today's typical RISC machine.
3.4.2 Comparing the hollow client typical instruction mix with that of typical RISC
We examined the instruction stream from the traces a) to search for any radical changes/optimizations that seem suitable to add to current RISC architecture, and/or b) to validate our assumption of treating the hollow client architecture as similar to today's RISC processor architecture. To that end, the trace tool provided us with a total percentage of load instructions followed by other loads, and a value for average basic block size. For integer instruction parallelism, the task was more involved. We wrote a script that looked at the disassembled list of common procedures executed and counted the number of potentially parallelizable ALU instructions in each procedure. We searched for eight-, four-, and two- instructrion parallelism.
4.0 Design Rationale
In determining the minimum performance requirements for the hollow client, it is suspected at first that the bottleneck to define is in the system's network bandwidth and latency, and if not there, in the machine's memory capacity and configuration. Our claim here, however, is that with careful design of the system architecture and efficient communication protocols between the client and its servers (as well as between the servers themselves), the client machine is still faced with an ultimate lower bound, that of executing the interactive edit/display operations with real-time performance satisfactory to the user.
This lower bound for this future "network computer" can be estimated with sufficient accuracy by examining current tools run on current machines, because we believe RISC architecture will be the baseline for processor design in the near future which has a more standardized design than before and the next leap in architecture technology not forseen to take place anytime in the near future. Hence, using instruction counts from traces of today's machines directly to compute the execution rate of the hollow client is a good approximation, especially if multiple instruction set architectures are traced and normalized to a generic one.
Another design decision we made is the choice of tools we traced to mimic the hollow client's most frequently executed functions. It is obvious that the means of communication is getting to involve more than just plain text as a medium. With greater and cheaper computing power, more graphics and multi-media will be employed in design and communication. For that reason, we chose a number of tools that have graphics and 3D immersive capabilities. To add accuracy to the traces, we were able to instrument the X server and trace it along with the applications run on it, thus capturing the full set of operations we see as those executed locally on the client.
Finally, as we noticed that the time and instruction count involved with doing loads, saves, and other network interactions are highly dependent on the network transfer protocol, transfer data size, and frequency of network transfer. We believe we were able to overcome that variablility by defining the premises under which the network dependence is minimized. But also, by doing that, we discovered some interesting results about the performance of network access operations as transfer data size vary as well as the number of instructions inserted between each network access.
5.0 Technical Results and Analysis
With the available set of tracing tools, different RISC machines, and the applications successfully instrumented for tracing, we were able to characterize what we believe is a minimal covering set of operations that have to be run locally on a hollow client machine, in a distributed design system. We tabulated the trace ouputs, characterized the computing involved in completing each operation on the traced machines, and converted those results into our timing model to deduce the equivalent computing requirements for the hollow client.
5.1 Exploring the dynamic instruction mix of the common operations
We traced nine edit/display, navigation, and network access operations. Appendix B contains the complete tables of results we obtained.
5.1.1 Dynamic Instruction Frequencies
We first examined the dynamic instruction mixes of the common operations on three RISC machines, and the average instruction frequencies are shown in Figure 2and Figure 3. Note that Xserver instructions are included in the instruction mixes, as we actually traced the Xserver on a Dec MIPS machine, and estimated its participation on the traces on the other two machines.

FIGURE 2. Dynamic instruction frequencies of editing and navigating operations.

FIGURE 3. Dynamic instruction frequencies of network access operations.
The instruction mixes we obtained are similar to a typical RISC instruction mix, which allows us to use a simple timing model for the hollow client to derive its performence requirements. For network access traces, we noticed that the performance is highly sensitive to the file transfer size, so we used only file sizes that have been confirmed to be small enough for real-time performance not to be effected by the network, as shown in our study in Section 5.2. We also made sure the sizes would be small enough to fit in all the machines' caches (worst case, 16KB), so as to minimize the effect of the memory system.
5.1.2 Minimum MIPS requirements to achieve real-time performance
The average total number of instructions and execution time that were used to complete each operation are shown in Figure 4and Figure 5. From these results, we can estimate the worst case performance shown acceptible on the machines we traced. We take the highest instruction count seen on the fastest machine for that operation (14M instructions on the SGI MIPS R4400), and we divide it by the execution time seen on the slowest machine for that same operation (.278 sec on the Dec MIPS) to get the MIPS lower bound, in terms of MIPS R4400 instruction set and clock frequency, which is 50.4 MIPS. We then must convert this number to that of the hollow client processor.
FIGURE 4. Instruction count for each common operation type.

FIGURE 5. Execution time for each common operation type.
We can estimate the average CPI (clock cycles per instruction) of the hollow client with the following timing model: given the averages of instruction frequencies (Fi) in Figur 2 and Figure 3, along with our estimate of instruction latency (CPIi)for each of the main instruction classes, the average CPI be calculated. Our knowledge of well-designed machines such as the MIPS R4400 or Alpha and of that of the textbook RISC processor gives us estimates for average CPI's for each instruction type. For integer ALU, we can assume a 2-4 way superscalar setting, which would result in an average latency of ~.65. Loads and Stores should take about 1 plus an overall estimate of miss penalty of about ~.3 clks. Branches will be very ewell predicted, with about 5% misprediction in worst case, and with very little miss penalty. The floating point unit, which seems necessary as part of a hollow client machine, if graphics are to be a regular part of the operations employed by users. A normal latency per fp operation is about 4 clks, but due to pipelining and since they come in clusters, a better estimate would about 3 clks. We will use the latencies in following table in the CPI equation below:
TABLE 1. Typical Instruction latencies in a 1990s RISC Processor
|
ALU (int)
|
Load
|
Store
|
Branch
|
Float. Pt.
|
CPI
|
0.65
|
1.3
|
1.3
|
1.05
|
3.0
|

We are also aware that the MIPS R4400's measured CPI from the traces (as well as the MIPS literature) that the average CPI is 1.0. Then since the CPIs are the same for both machines, as expected, the MIPS requirements oof the machines are the same:
MIPS Lower Bound of the Hollow CLient = ~50MIPS
5.1.3 Instruction-level Parallelism
In addition to finding the lower bound on performance, we examined the instruction stream for the tested operations to explore any potential for improvements in the typical RISC architecture that is assumed to remain with the hollow client. The trace tools provided us with counts of parallelizable pairs of Load instructions (Figure 6), the average size of the basic block (Figure 7). To search for interger ALU instruction parallelism, we disassembled the traces, and counted through the top 20 most frequent procedures (using a perl script) for potentially parallelizable instructions, in groups of 8, 4, or 2 instructions (see Figure 8).

FIGURE 6. Percentage of loads followed by loads for each kind of common operations.

FIGURE 7. Average basic instruction block size for each kind of common operations.

FIGURE 8. Percentage of parallelizable ALU instructions for each kind of common operations.
Figure 6 shows us that we do have more Load instruction parallelsim on the hollow client than on the typical RISC machine, but an average rate of such an opportunity of 6-7% does not offset the increased cost in complexity of including an additional Load unit in the processor. We do however see that there is certainly more integer ALU parallelism, especially for 3D grahpics applications, and that will be a good optimization feature to consider. In fact, machines that will have emphasis on graphics and multi-media seem to be prime targets for the employment of a special set of integer instructions, called packed instructions, that perform on arrays of small integers in parallel (e.g. 8 8-bit integers) for tasks related to problems of finite size such as the monitor screen, a certain resolution of an image, etc.
5.2 Impact of the Network
As the hollow client resides on the network, we felt it necessary to study the effect the network has on the client's ability to carry out its local tasks. The network bandwidth and latency has the most impact during real-time data transfers such as loading a page on a browser, updating the design information with the database (on the net, likely on the server side) while editing a document. The size of each data transfer and the transfer frequency are significant factors for determining the network access performance. In order to explore such factors, we built a data transfer model consising of a simple server and a simple client, and examined the communication behavior between them on different machines. The communication protocol used is very similar to that for SMTP and FTP.
5.2.1 Effect of Data Transfer Size on Performance of Network Access Operations
When sending(receiving) updates to(from) the network, there is a question of how large should the data transfer size be so that the network latency is still acceptable. Given that the measured execution time acceptable to bring up a document is about 500ms, our goal here is to find the maximum size of data transfer that would be permissible by the hollow client to perform updates in real-time.
To achieve the goal, we measured the execution time and number of instructions executed to transfer different sizes of data on three machines(see Figure 9 and Figure 10).

FIGURE 9. Execution time (in ms) per file transfer for different file sizes (in bytes) on Dec Alpha, Sun Sparc, and Dec MIPS (in log-log scale).

FIGURE 10. Number of instructions (in kilo) per file transfer for different file sizes (in bytes) on Dec Alpha, Sun Sparc, and Dec MIPS (in log-log scale).
From the figures, we see that for transfers up to file size ~150 bytes, the execution time and instruction count for each transfer is at its minimum, determined by the transmission protocol, not the transfer file size. After that point, both the execution time and instruction count increase linearly with the file size. We can see that given our 500ms limit on response time to execute a page load into a browser (or design query system) imposes a pure upper limit on data transfer size, at up to ~50KB, permissible by the hollow client to perform interactive updates. With current transfer protocols, it seems that a minimum execution latency (about 10K instructions/transfer) is required for any transfer up to around 150 bytes, in order to relieve the file transfer time from dominating the execution time of loading up, say, a web page.
5.2.2 Effect of Network Access Frequency on Network Access Performance
Besides the size of data transfer, the transfer frequency can also put a requirement on the hollow client performance. The question here is how often should the client acccess the network to maintain the hollow client performance but still minimize the effect from the network latency. We did similar traces as above but we varied the frequency of network access relative to total execution time between accesses. We modeled the different frequencies by putting different sizes of integer instruction blocks between each transfer and measured the execution time required to finish one data transfer and one instruction block. The results on three different machines are given in Figure 11, 12 and 13 .

FIGURE 11. Execution time per iteration (in ms) for different file sizes (in bytes) and different number of instructions between file transfers (in kilo) on a Dec Alpha.

FIGURE 12. Execution time per iteration (in ms) for different file sizes (in bytes) and different number of instructions between file transfers (in kilo) on a Dec MIPS.

FIGURE 13. Execution time per iteration (in ms) for different file sizes (in bytes) and different number of instructions between file transfers (in kilo) on a Sun Sparc.
From the figures, we can see the flat region at the bottom of the curves indicate the bound an upper bound on the average file transfer size to use, along with a lower bound on the number of instructions to execute between network accesses to relieve the client of network dependence for performance. It seems that the size can of up to about 5Kbytes, and with at least 200K instructions, if the execution time per file transfer is to remain dominated by non-network activities (the local integer instructions) on the hollow client. When the data transfer size gets larger than the point, the network latency dominates the execution time even when the access frequency is high (large instruction block size). So, thee MIPS lower bound we have established on the local operations of the hollow client will only be an issue to consider if working under conditions similar to those in the lowest region in Figure 11.
6.0 Future Work
While we performed a large number of traces to simulate the set of operatoins we perceive as the set most frequently to be run locally on the client, it would be of great interest to examin and quantify the appropriate partitioning of tasks between a certain type of hollow client and its serers. It is also of our interest to derive not only a performance lower bound on the hollow client, but also on power consumption, a most critical factor in the design of portable machines. Given that we can estimate the clock frequency of our hypothetical hollow client, (we already gave it a MIPS lower bound of 50, and we estimated a CPI of about 1, therefore we can put a lower bound on clock frequency of 50MHz). Since we know the minimum number of instructions that will continuously be executed for each of the common operations we have studied, we can determine the amount of power that gets consumed per operation, and thereby provide for an upper bound on the lifetime of a certain power battery to be used in some of the lighter, portable hollow client machines. This sort of study in low power design, at the architectural level has shown to be a more effective way to improve power consumption rates of electronic circuits.
7.0 Summary
With data display and edit (brought in from and out to the distributed set of resources on the net) as the hollow client's main operations at the user end and with all else migrated to the servers on the net, we have learned that even without taking into account real network bandwidth defficiency, or finite memory system performance, there is still an ultimate lower bound of performance requirement on the hollow client, if it is to act as a feasible network computer:
- Response time for such operations seems to govern the performance requirements on the client whenever the network or the memory system do not.
- Using a RISC architecture, with a target response time for the common interactive operations, similar to that of today's low-end workstation (Dec MIPS), the hollow client processor is required to have larger than 50 MIPS performance, assuming an ideal memory system and high-bandwidth network.
- A useful optimization on the client processor would be to exploit the additional (10-20%) parallelism of integer instructions seen on the common operations.
- The network bottleneck can be alleviated if the size and rate of network transfers are both limited, meaning having less than 5-10KB per transfer and more than 200K instructions between transfers.
8.0 Appendix A: Machines informaton
TABLE 2. Information of the machines used in data collection.
Machine
name
|
Hardware
Architecture
|
CPU
Type
|
Chip
Speed
|
Cache
|
Memory
|
jersey
|
SGI RISC
|
MIPS R4400
|
250 MHz
|
32Kb/2Mb
|
128 Mb
|
lolita
|
SUN SPARC
|
SPARC 20
|
60 MHz
|
16 Kb/1 Mb
|
96 Mb
|
susie
|
DEC ALPHA
|
DECchip 21064
|
190 Mb
|
1 Mb
|
256 Mb
|
trivial
|
DEC RISC
|
MIPS R3000
|
45 MHz
|
16 Kb
|
64 Mb
|
ic
|
DEC RISC
|
MIPS R4400
|
64 MHZ
|
16 Kb/1 Mb
|
128 Mb
|
9.0 Appendix B: Dynamic Instruction Stream Measurements
TABLE 3.
|
Instruction Mix ( in %) (Dec Alpha / Dec MIPS / SGI MIPS)
|
ALU
|
Load
|
Store
|
Branch
|
FP
|
NOP
|
Draw Line (xfig)
|
31.7/45.4/-
|
23.3/26.7/-
|
12.1/12.6/-
|
21.1/15.3/-
|
4/9.1/-
|
13.4/3.8/-
|
Place Object (FSM Editor)
|
35.1/52.6/31.3
|
20.3/21.2/24
|
9.6/7.3/10.9
|
19.9/18.6/16.9
|
9.7/9.2/11.8
|
13/5/11.3
|
Move Object (xfig)
|
41.5/31.8/-
|
26.7/21.8/-
|
15.7/13.2/-
|
18.2/17.6/-
|
9/3.8/-
|
2.4/13.6/-
|
Move Object (FSM Editor)
|
53.2/35.7/36.9
|
20.4/19.1/25.0
|
7.2/9.2/11.9
|
20.2/22.8/21.1
|
1.2/5.6/4.0
|
6/13.5/11.1
|
Pan/Zoom (2-dimension)
|
44.9/32.8/-
|
22.5/22.1/-
|
15.6/12.2/-
|
18.2/19.0/-
|
9.3/2.4/-
|
2/13.5/-
|
Render (3-dimension)
|
-/-/30.2
|
-/-/32.3
|
-/-/15.9
|
-/-/14.8
|
-/-/5.5
|
-/-/11.1
|
Animate (3-dimension)
|
-/-/29.7
|
-/-/32.4
|
-/-/16.3
|
-/-/13.8
|
-/-/6.5
|
-/-/10.8
|
Window Popup/Close
|
-/35.4/-
|
-/19.2/-
|
-/10.0/-
|
-/23.7/-
|
-/0/-
|
-/14/-
|
TABLE 4.
|
(Dec Alpha / Dec MIPS / SGI MIPS)
|
Load after Load
|
Basic Block
|
Total Cycle Count
(M)
|
IC (M)/operation
|
Time/operation
(ms)
|
Draw Line (xfig)
|
5.9/13/-(%)
|
4.7/6.5/-
|
114/4.94/-
|
.436/.229/-
|
18.1/1.72/-
|
Place Object (FSM Editor)
|
8.3/6.5/9.0(%)
|
5.0/5.4/5.1
|
1151/515/257
|
4.22/5.90/12.7
|
223/44.4/103
|
Move Object (xfig)
|
10.4/.606/-(%)
|
5.5/5.7/-
|
52.1/177/-
|
.514/1.43/-
|
3.84/60.6/-
|
Move Object (FSM Editor)
|
5.3/-/6.91(%)
|
4.9/-/-
|
1358/1404/1478
|
13.2/14.04/13.89/
|
94/278/96.2
|
Pan/Zoom (2-dimension)
|
7.9/6.2/-(%)
|
5.5/5.3/-
|
32.4/70.7/-
|
.158/.571/-
|
1.20/28.4/-
|
Render (3-dimension)
|
-/-/8.14%
|
-/-/6
|
-/-/696
|
-/-/3.95
|
-/-/25.1
|
Animate (3-dimension)
|
-/-/8.4%
|
-/-/6
|
-/-/847
|
-/-/4.85
|
-/-/31.1
|
Window Popup/Close
|
-/6.0%/-
|
-/4.2/-
|
|
-/9.05/-
|
-/-/-
|
TABLE 5.
|
Instruction Mix ( in %) (Dec Alpha / Dec MIPS / SGI MIPS)
|
ALU
|
Load
|
Store
|
Branch
|
FP
|
NOP
|
Page Browse
|
43.0/30.4/-
|
31.8/25.8/-
|
12/8.9/-
|
11.9/17.8/-
|
0/0/-
|
1.9/18.4/-
|
Load/Save
|
44.6/29.6/-
|
31.2/24.3/-
|
10.8/9.2/-
|
19.7/13.5/-
|
0/0/-
|
19.2/.08/-
|
TABLE 6.
|
(Dec Alpha / Dec MIPS / SGI MIPS)
|
Load after Load
|
Basic Block
|
Total Cycle Count
|
IC/operation
|
Time/operation
|
Page Browse
|
10.8/5.8/-
|
8.4/5.6/-
|
576M/1447M/-
|
5.36M/13.6M
|
40.3ms/.545ms
|
Load/Save
|
61.8/5.3/-
|
7.4/5.1/-
|
138M/457M/-
|
2.45M/12.2M/-
|
18.4ms/496s/-
|
10.0 References
[1] The WELD Distributed Design Intergration System: UC Berkeley CAD Group
Advisor: Prof. Richard Newton.
http://embedded.eecs.berkeley.edu/Respep/Research/dds
[2] Baker, Wendell, "WELD--The Hollow Client Approach": UC Berkeley CAD GRoup,
Advisor: Prof. Richard Newton.
http://www.eecs.berkeley.edu/~wbaker/arpa96/03-review/weld.html
[3] Socarras, A.E.; Cooper, R.S.; Stoneycypher, W.F. "Anatomy of an X
terminal." IEEE Spectrum, March 1991, vol.28, (no.3):52-55.
[4] Slothouber, Louis P., "A Model of Web Server Performance"
http://louvx.biap.com/white-papers/performance/overview.html
[5] Spiller, Mark D., "HTTP vs. FTP", school project paper
http://www.eecs.berkeley.edu/~mds/java/http.html
[6] Architecture Tracing Tools
http://www.cs.Berkeley.edu/~xjiang/cs252TA.html
[7] Smith, Michael, "Tracing with Pixie", Center for Integrated Systems,
Stanford Unversity.
http://http.cs.Berkeley.edu/~gnguyen/cs252/F94/pixie/pixie_manual.ps
[8] Sparc Architectural Simlulator "SimOS". CAD Group, Stanford university
http://powderkeg.Stanford.EDU:80/~herrod/SimOS
[9] Mittal, Alok, "Benchmarking for Graphics Applications", CS252 Project
Report. http://http.cs.Berkeley.edu/~alok/acad/cs252_project.html
[10]Edwards, Stephen,"States" FSM Editing tool, UC Berkeley CAD Group
http://ptolemy.eecs.berkeley.edu/~sedwards/research.html
[11]Shilman, Michael, "InteractND--a 3D VRML Browser", UC Berkeley CAD Group
http://www.eecs.berkeley.edu/~michaels/research/InteractND
CS252 Project: Characterization of the "Hollow Client" Processor in a Distributed Design Environment
Last updated: May 8, 1996