Lab - Performance

TODO: Make more, fatter, data points

Okay, just follow along, questions will be denoted with a Q.

Resources

Using the time utility

We're going to use time to gather runtime data about our quicksort, on strings.

We will sort inputs of 10,000, 20,000, ... , 100,000 words, and graph our results.

E.g.:

n T(n) T(n) / f(n)
f(n) = n f(n)=n3 f(n)=n2
10
520
52
0.520000
5.2
20
2080
104
0.260000
5.2
30
4680
156
0.173333
5.2
40
8320
208
0.130000
5.2
50
13000
260
0.104000
5.2
60
18720
312
0.086667
5.2

Clearly T(n) is increasing. So, it is bound below (but not tightly) by a constant. We say T(n) = ω( 1 ).

Note: Big-O is an upper bound, which may or may not be tight. Little-O is a loose upper bound, which is not tight. Similarly, Big-Omega is a lower bound, which may or may not be tight, and little-omega, ω, is a loose lower bound, not tight.

When we divide T(n) by f(n) = n, we see these values apparently still increasing to ∞. So, T(n) is bound below by a line. We say T(n) ∈ Ω(n). In other words, T(n) grows at least as fast as a line.

So, we choose f(n) = n3, and we see that T(n)/n3 is probably decreasing to zero (we need to look around a bit more to be sure). This means that n3 is an upper bound. I.e., T(n) grows no faster than n3. So, T(n) ∈ O(n3).

We now choose f(n) = n2. Well, I should've made this example a little more interesting. We can surmise that T(n) is, in fact, 5.2 n2. The important point is that T(n) / n2 is tending towards some non-zero constant. So, T(n) ∈ Θ( n2 ).

Q 1 Supply your chart (that is just tabular data, n, T(n), T(n)/f(n) for various choices of f(n)), not a graph, in your gradesheet, and your conclusions

Scale each column as convenient. That is, remove leading (misleading) and trailing zeroes. If you use scientific notation, do not change exponents. Make it easy on the eyeballs.

And get the columns to line up nicely. (No tabs.) Limit yourself to 120 columns.

Guys, keep conclusions succinct. I don't need a bed-time story, nor color commentary. I don't need your procedure. Just give me your thoughts for each column, as a simple statement, using Big-Oh notation, then summarize, the best you can say. Succinctly. Using Big-Oh notation.

C's clock command

C has a clock command in its library. Other languages have similar ideas. See sortr.c for an example.

It gives us a little better granularity, about what we're timing (we can skip overhead, etc.). Essentially, there is a clock that starts at 0 when your program starts, and should only tick while your program is executing (as opposed to being sliced out). Each call to clock() just grabs that time. So, you grab a start and end time, take the difference.

clock() returns the # of tics, which is system-dependent, both the value and the granularity. Dividing by CLOCKS_PER_SEC will give you the time, in seconds. But, for graphing/evalution purposes, tics is fine. Don't divide away significant digits.

Using a bit of the gprof utility

For another approach at the same problem, we're going to count the # of swaps performed, rather than measure raw time.

We'll still be working w/the same set of inputs: 10,000, 20,000, ... , 100,000 words, and graph our results.

Q 2 Supply your chart (just the tabular data, no graphs), and your conclusions in your lab sheet.