CS 647 Distributed Systems

CS647: Distributed Systems

Course Syllabus
Credits: 3 hour lecture [3 credits]
Spring 2019

General Information

Course Coordinator(s): Dr. Colin S. Gordon
Instructor Contact Information
(phone, email, website):
csgordon@drexel.edu
https://www.cs.drexel.edu/~csgordon/
Office Hours, Location, Mailbox: Office hours by appointment, in 3675 Market Street Office 1137 for those near campus, or online for remote students.
If you have anything you'd like to discuss, please do not hesitate to schedule an appointment! It really is not an imposition. There are no recurring hours because historically office hours associated with evening classes have not been well-attended because of student scheduling conflicts, and the presence of recurring hours tends to discourage students who can't make them from reaching out.

Student Learning Information

Course Description

In-depth discussion of fundamental concepts of distributed computer systems. Covers development techniques and runtime challenges, with a focus on reliability and system validation techniques. Subjects discussed include: interprocess communication, remote procedure calls and method invocation, middleware, distributed services, coordination, transactions, replication and weak data consistency models. Significant system-building term project in Java or similar language.

Course Purpose within a Program of Study

Within the revised MSSE program, this will serve as one of 6 possible CS electives (3 required along with 3/6 IS electives) to provide broader knowledge knowledge of software engineering. The MSSE degree emphasizes modern practices and techniques to produce reliable software that functions as desired, in a timely manner. Distributed systems are an increasingly important domain and more software systems move to shared cloud infrastructure.

Within the CS PhD program, the course will serve as an elective suitable for any graduate student, but particularly for those with research interests in software engineering, systems, or programming languages.

Statement of Expected Learning

The course objectives are to:

As learning outcomes, students completing this course should be able to:

Course Materials

Required: No textbooks, instructor-selected research papers.

Recommended: Designing Data-Intensive Applications, by Martin Kleppmann. This book is optional, but will provide a nice extended discussion for much of the course material. In addition to the natural option to purchase a hardcopy, it is available in electronic form (with and without DRM, see the book’s site), and via O’Reilly’s Safari Books Online platform. You can access this via Drexel Libraries’ subscription by going here, clicking on “Full Text Online” and signing in with your Drexel credentials.

We may also read excerpts from Concurrency Control and Recovery in Database Systems, which is freely available in PDF format from the authors.

Required and Supplemental Materials and Technologies

This course will be programming-intensive; you should expect to write a moderate amount of very challenging code. Assignments must be completed using one of the following actor frameworks:

Akka.NET is a .NET port of the original JVM implementation of Akka, so both provide nearly identical interfaces. I’m happy to read and grade code in any of the four languages above, though I’m less familiar with F# than the others, so I’ll be less helpful debugging F#-specific issues.

Projects in the course will require you to write full self-contained projects using a build file for a cross-platform build tool that handles fetching dependencies, compiling your code, and executing the code. On the JVM, any of the major tools is acceptable (ant+ivy, maven, gradle, or sbt). For .NET, msbuild via the console, or dotnet for .NET Core projects are acceptable (update: also fake for F# projects). Projects must build from the command line: projects that only build via an IDE (Eclipse, Visual Studio, etc.) will be penalized, though you are welcome to use whatever editors or IDEs you like for writing your code.

The JVM version of Akka has the best documentation. Akka.NET’s documentation seems a bit more sparse, but since it attempts to expose nearly the same API as the JVM version, you could largely adapt the Java version of the JVM Akka documentation for C# if you needed to.

Both versions of Akka have reasonable books available to you through the Drexel Libraries:

Assignments, Assessments, and Evaluations

Graded Assignments and Learning Activities

The course grading is focused on responses to readings, as well as short homework assignments.

Readings

Each week (except the first) you will need to respond to two research papers, no later than 6pm the day before class. Late responses are not accepted, but below there is a policy allowing you to skip a few of these during the term without penalty.

For each paper you will submit a commentary that covers your understanding of the major points and contributions of the paper, the paper’s limitations, things you might not have understood (which we can explain in class!), ways to extend or improve the paper’s work, or other problems this paper’s solutions might provide insights to. Note that not all of these questions make sense for every paper we read. I mainly want to see (a) how well you understood the main ideas from the paper, and (b) that you’ve thought about how well or poorly it might really work.

You should consider:

Not all of these will make sense for every paper, but most of them are sensible to cover for most papers.

Note that different people will often have different takes on the same paper, disagree on whether a choice made by the authors is a strength or weakness, or find different things clarifying or confusing. This is all okay! Everyone has different backgrounds. If you are confused about some part of the paper, don’t be shy: almost certainly someone else found it confusing, too, and maybe they’re too timid to mention it. By pointing out what was confusing or difficult, we raise the opportunity to discuss it in class and help everyone understand better. I found some parts of these papers difficult or confusing the first time (or two) I read them - this is normal.

You may skip up to 4 reading responses during the term, by submitting, instead of a response, a sentence saying you are using a skip — by the deadline. Do not email me a request, just submit it. You may do this for any 4 responses, distributed any time in the term. You may skip one response in each of four weeks, both responses in each of two weeks, or take the in-between option of skipping one week completely and half the work on two additional weeks. Assignments for those weeks will simply be omitted when calculating your reading response grade for the term. One thing to consider, though: the skip only means you do not need to write a response. If it’s a paper that is useful for your homework, you may still find it beneficial to read the paper, even if you skip the response. You may apply skips retroactively; if at the end of the term you have a 0 or 1 you’d like to remove, and you have skips remaining, you can apply the skip to that response by emailing me a request in the final week of the term (I will not do this automatically). There is no bonus for having unused skips at the end of the term.

A Note on Proofs: Many of the papers you will read this term include formal proofs of correctness for an algorithm or protocol. You won’t be asked to produce proofs in this class. But, you will need to understand them. Some of your homework assignments will have you implementing algorithms from these papers, and understanding the proofs of correctness will help you think about the code you write. More broadly, outside this class, some of the proofs are fundamental impossibility results. It’s all well and good to understand that X is impossible, but you’re rarely asked to do X that is known to be impossible. Instead, you’re sometimes asked to do Y, where Y and X have some strong similarities. Sometimes Y is simpler than X in a key way that makes it possible. Sometimes Y is actually a variation on X. Understanding the proof for why X is impossible will help you recognize when you see variations on it.

Homeworks

The homeworks are tentatively on the following:

The late policy for homeworks is as follows: for the term, you have 5 late days to distribute between homework assignments at your discretion, with one restriction: the final homework may not be submitted after the end of the final week of classes.

Each homework should only require a modest amount of code, but that code might be very difficult to write and debug. in addition to coding, each homework will include some kind of reflection or analysis of what you did, generally in an open-ended way.

Grading Matrix

Responses are graded on a scale of 0 to 2, with the possibility earn a 3/2 on any summary:

As noted above, the grading for responses emphasizes that you have made a serious effort to understand the paper and consider its strengths, limitations, etc. The response grading does not necessarily depend on correctness of your understanding, because most of you will lack the background to fully understand all parts of the paper, which can lead to misunderstandings and lack of understanding for parts. That’s okay and expected, and won’t hurt your grade unless your response suggests you have very fundamental misunderstandings of what the papers are even trying to accomplish. Historically, this has only occurred when people have not actually made a serious effort to read the papers. The responses are due the day before each lecture to allow me time to read the responses, and update the lecture material to both address common or important questions, and clarify any misunderstandings I see.

The late/skip policies were described above.

In addition the late and skip policies, extensions are possible for good reason with reasonable notice. I am aware that students have jobs, family matters, paper deadlines for their PhD, etc., which can interfere with completing assignments. I want your grade to reflect your mastery of the material and quality of work you hand in, not whether or not you were fortunate enough to avoid major life events during the term. If something comes up during the term, let me know. If it’s unexpected (e.g., you end up in the ER when you were planning to work on coursework), let me know when you can and we’ll figure it out. If it’s something you know about in advance (e.g., you must travel for work), let me know as soon as you know, and we can discuss whether we should give you an extension on an assignment. I reserve the right to request supporting evidence for your stated need for an extension (only so far as justifying the existence of a good excuse; e.g., I might ask for a note confirming existence of a health issue interfering with attendance or assignment completion, but I don’t need to know the details of the particular health issue).

Attendance

If you’re in the in-person section: Drexel’s stated policy is that course attendance is mandatory for students in the in-person section. I will not take attendance explicitly every class, because it’s tedious and takes away time from actual material. But I do expect you to come, and if you are absent on a regular basis it will negatively affect your term grade, beyond the grading percentages above. That said, I understand things happen (you might get sick, your car might break down). If you’re in the in-person section and will miss class, please let me know.

If you’re in the online section: I will work to make it possible for you to ask questions live during class, for those interested in watching live. Otherwise, enjoy the flexibility of online learning.

Academic Integrity, Cheating, and Plagiarism

The list of links at the end of the syllabus include a link to the University’s academic integrity policy. If you haven’t actually read it before, you should, because not meaning to plagiarize is not an excuse for plagiarism. This includes not realizing that something needed to be quoted, or being unfamiliar with the idea that paraphrased sentences still require citation (and possibly quotes), or opting to reuse someone else’s words or code because you’re not confident in the quality of your own.

The general idea is that you should not submit work that is not your own — code or written prose — that is not properly attributed. This includes, but is not limited, to things like putting direct quotes from someone else’s writing in quotes and citing the source, and giving the source for small snippets of code you might have taken from StackOverflow or similar. Again, you should read the actual university integrity policy.

The University leaves the penalty for cheating, plagiarism, etc. in a course up to the professor. If you cheat in this class, I will give you an F for the term. I realize that most cheating is a consequence of poor time management, or unexpected or hard-to-manage obligations beyond the class. That is exactly why you have late days for homeworks, skips for readings, and the course has a fairly flexible extension policy - I want you to succeed, but I want you to do so honestly. If you have any doubts about whether something might cross the line into cheating, please ask me before you do it. The worst I’ll say is “No, don’t do that.” And I’ll be glad you asked. This is far better than an F for the term.

A final note: If you quote or reuse other sources (properly, with attribution) so heavily I feel like you haven’t actually done the work for the assignment/response, I’ll give you a 0 (because you didn’t do the work). But you are allowed to take small snippets (e.g., setting up an actor, the basis for a build file) from external sources for small things that are not central to the assignment.

Grade Scale

The following scale will be used to convert points to letter grades:

Grade
97-100 A+ 82-86.99 B 70-71.99 C-
92-96.99 A 80-81.99 B- 67-69.99 D+
90-91.99 A- 77-79.99 C+ 60-66.99 D
87-89.99 B+ 72-76.99 C 0-59.99 F

Note that the instructor may revise this conversion if/when necessary.

Course Schedule

(This schedule is tentative and may change during the course.)

Most weeks attempt to pair:

Currently the syllabus is final up to and including week 9.

Week by week:

  1. Introduction, Overview, Actors
  2. Challenges and Time in distributed systems
  3. Strong Consistency
  4. Consensus (Paxos, Raft, etc.)
  5. CAP, FLP, and other impossibilities
  6. Weak and Eventual Consistency
  7. Getting Things Right
  8. Large scale data storage and processing: Hadoop & Spark
  9. Distributed Resource Management
  10. Cryptocurrency (a.k.a., Consensus round 2)

Academic Policies

This course follows university, college, and department policies, including but not limited to:

The instructor(s) may, at his/her/their discretion, change any part of the course before or during the term, including assignments, grade breakdowns, due dates, and schedule. Such changes will be communicated to students via the course web site. This web site should be checked regularly and frequently for such changes and announcements.

Students requesting accommodations due to a disability at Drexel University need to request a current Accommodations Verification Letter (AVL) in the ClockWork database before accommodations can be made. These requests are received by Disability Resources (DR), who then issues the AVL to the appropriate contacts. For additional information, visit the DR website at drexel.edu/oed/disabilityResources/overview/, or contact DR for more information by phone at 215.895.1401, or by email at disability@drexel.edu.