Lecture 4: Searching (linear and binary) and Sorting (bubble, merge and quick) Algorithms


I. Intro to Algorithms

In computer science there are two distinct parts of it that makes this science work. They are systems and theory. Systems makes up all the engineering aspects of computer science, including all the software design aspects that I have gone over and all the operating systems and network aspects of this field. Theory comprises of almost all mathematical foundations that pave the way for implementation of problem solving. This is what we will focus on for the rest of the term. Algorithms are the heart of the theory aspect of computer science.

One aspect of an algorithm is can it get the job done, or its correctness. This is the major concern of this term in this class... but besides correctness there are a lot of other concerns too. One of them is runtime complexity. Instead of worrying about how the correctness or exactly how long an algorithm takes... you are concerned about how increasing the input size will affect the increase in the computation time. Another aspect of algorithms that most people are concerned about is the feasibility of using an algorithm... will it take too long? Is there a way to reduce it to another problem? But those are for later studies in algorithms.

One of the things about algorithms in computer science is that you are not purely confined to the mathematical nature of solving a problem. Many people will think of ways to make an algorithm very efficient using all kinds of math tricks, like recursion, linear optimization and all kinds of other stuff... but the biggest leverage in computer science is data structures. It is almost impossible to talk about algorithms in computer science without putting a heavy emphasis on how much data structure and how it's arranged can change the whole formulation of a problem. We will see a little of that today in searching.

II. Searching

Intro

Searching is one of the most important tasks we can do in computer science. It is used from applications like database to AI to word processing. Depending on what your needs are and what underlying structures you have as well as the way your data looks... you may use a very different algorithm for what seems to be the same problem. Today we will look at only lists for searching.

Linear Search

This type of search is the easiest... it's probably what you would have done even if no one teaches it to you or what the name of it is. You do linear search by just going from element to element in a straight line, from front to back skipping nothing.

Algorithm: Linear Search

  1. Look at the first element
  2. If element that's been looked at is the end and is not what you want, return not found
  3. If element that's been looked at is what you are looking for, return found location
  4. Move the look at element to the next element
  5. Go back to step 2 and repeat.

Here's an animation of linear search: https://www.cs.usask.ca/resources/tutorials/csconcepts/1998_3/linear/java/index.html

Courtesy of http://lecture.ecc.u-tokyo.ac.jp/~ueda/cs06/cs11.html, no clue what the Japanese says but it's trying to find something that's not in the list.

Binary Search

Binary search is the first specialized algorithm that we will be talking about. This search assumes that the list is SORTED. So every element must be in ascending order from first to last, or descending... whichever one you design your implementation for. What this algorithm does is this:

Algorithm: Binary Search

  1. Bound yourself to the whole list
  2. Look at the first and last element that you are bounded to and make sure you are within bound. If you are not within bound, return not found.
  3. If the first and last elements are not what you are trying to find, point to the middle element.If the middle element is what you want, return middle's location. Else if first or last is what you want, return their location.
  4. If what you want is less than the middle, make the smaller half the new bound... else make the larger half the new bound.
  5. Go back to step 2 and repeat as necessary.

Here's an animation of binary search: http://www.cosc.canterbury.ac.nz/mukundan/dsal/BSearch.html. Here's the code: http://www.java-tips.org/java-se-tips/java.lang/binary-search-implementation-in-java.html

Courtesy of http://lecture.ecc.u-tokyo.ac.jp/~ueda/cs06/cs11.html... and once again I have no clue what it said but I'm assuming with the exclamation mark, they're trying to find 39.

III. Sorting

Intro

Sorting, as we saw, is essential for some things to work, like binary search. But besides that point, it is still good to sort. Some of what we see everyday include online shopping or search results that allows us to sort by certain fields, like name, date, etc.

Bubblesort

It used to be that a course like this would always start off teaching you bubblesort... because it was easy... but I guess we all decided that insertion sort is even easier. It has one of the worst runtime complexity of any sort (to be covered in week 5) but is the easiest to understand and code. Here's the basic algorithm for insertion sort:

Algorithm: Bubblesort

  1. Pointer 1 looks at first element, Pointer 2 looks at second element (if it exists).
  2. If pointer 1 points to last element, exit
  3. Move pointer 2 across all remaining elements and swap if pointer 2 points to a smaller element (if we're trying to do ascending sort)
  4. Move pointer 1 to the next element
  5. Move pointer 2 to the element after pointer 1
  6. Go step 2 and repeat

This algorithm makes sure that pointer 1 is always the smallest of the remaining elements by the time pointer 2 finished looking at all remaining elements in the list. Here's the animation and code: http://www.cs.oswego.edu/~mohammad/classes/csc241/samples/sort/Sort2-E.html

Mergesort

This is a classic recursive problem in that it requires you to keep dividing the list into halves until you are down to either one or two elements where sorting is trivial. It then proceeds to merge the lists together to form a sorted list. It is harder to code with the all the book keeping... but it is a whole magnitude better than bubblesort. It is only slightly worse than counting sorts... which we will not cover due to the unpredictable nature of those sorts. Here is the algorithm for the sort:

Algorithm: Mergesort

  1. Take the list, divide it in half
  2. Repeat step 1 until the list is divided into a collection of lists of only size 1 or 2
  3. Go through all halves and merge them
    1. Merge: take the first element of the first half and compare it to the first element of second half
    2. Put the larger one into the new "combined list" of the two and take it off the list that it came from. If one of the halves are empty, just put the other's first element in.
    3. Repeat step 1 until both lists are empty
  4. Repeat step 3 until all halves are merged back into the original size.

I know that algorithm is hard to describe in words... so here is the animation: http://www.cse.iitk.ac.in/users/dsrkg/cs210/applets/sortingII/mergeSort/mergeSort.html. Here's the code for it: http://www.java-tips.org/java-se-tips/java.lang/merge-sort-implementation-in-java.html

Quicksort

Unlike the name suggests, it is not the fastest. Well, at least in theory... with hardware advances it's starting to prove to be faster than some algorithms that it is supposed to be slower than. But the key idea here is using divide and conquer. It keeps partitioning the list into two sublists that only has the property of been less than or greater than a pivot. This pivot is like the "center" of the list. The choice of the pivot can greatly affect the speed of an individual execution of the algorithm. Here's the algorithm:

Algorithm: Quicksort

  1. Take the partition and specify a pivot
  2. Place all elements less than [pivot] in front of the pivot and greater than the [pivot] after it.
  3. Make the less than side of the pivot a partition and the greater than side of it a partition as well
  4. Apply step 1 to both partitions until both partitions are size of 1 or 2.

As you can see, this algorithm relies heavily on the first step, specify a pivot. This will not be the focus of this course since this is a research topic all on its own and is a very hard problem depending on the data that you have. But what we are concerned about is the idea of why and how this algorithm works. Here's an animation of it to make understanding it easier: http://www.cse.iitk.ac.in/users/dsrkg/cs210/applets/sortingII/quickSort/quickSort.html. Here is the code for it: http://www.wanginator.de/studium/applets/quicksort_en.html

IV. Note for Project 1

For searching, you can just use linear search... but we would like you to use either Mergesort OR Quicksort for sorting. I have provided a link to code for it... you just have to make it work for your data structure (classes).