Topics in Programming Languages: Automated Testing, Bug Detection, and Program Analysis
CPSC 539L, Winter Term 2 2025 (Jan-Apr 2026), Instructor: Caroline Lemieux
(clemieux@cs.ubc.ca)
Tu/Th 11:00AM-12:30PM, SWNG 210
|
Quick Links
Format • Schedule • Academic Honesty • Responses • Project • Discussion Leads
|
Course description ^
Software bugs remain pervasive in modern software systems. As software becomes increasingly intertwined
in all aspects of our lives, the consequences of these bugs become increasingly severe. For instance,
since the mid 2010s, several important security vulnerabilities have emerged from basic correctness bugs
in software (e.g. Heartbleed, Cloudbleed, Shellshock ). Though not all bugs
cause important security vulnerabilities, their overall cost remains high: one estimate puts the cost of
dealing with software failures and defects in the United States alone in 2018 at nearly
USD$1,540,000,000---the worldwide impact of these defects is even more severe.
While human-written tests are a prevailing method for assuring the quality of software in practice, they
are still subject to human error, and allowing bugs to slip through the cracks that the human tester did
not consider. This class will explore topics in the field of automated testing, which aims, at a
high-level, to reduce the developer burden in discovering bugs and writing test cases. Recent advances in
this field leverage static and dynamic information from the program under test in order to effectively
explore the space of behaviors of the program under test, enabling the quick discovery of potentially severe security bugs.
The concerns of scalable bug detection and program debugging require innovation from multiple different
fronts.
As such, this course will cover papers from Software Engineering, Programming Languages, Security, and
Systems venues.
Course-level learning goals
After the course, participants should be able to:
- Describe the problems solved by the following techniques, recognize the scenarios in which each technique can be used, and analyze the pros and cons of each technique, for the following techniques:
- Test-input generation: blackbox fuzzing, greybox fuzzing, symbolic execution
- Test oracles: crashing oracles, differential oracles, automated test-oracle generation
- Property-based testing
- Whole test-suite generation
- Pattern-based Static Analysis
- Fault Localization
- Dynamic Data Race Detection
- Identify contributions and room for improvement in existing research papers;
- Debate the positives and negatives of existing research papers;
- Categorize and summarize concerns and questions about existing research papers
- Design and conduct the experimental evaluation of a program testing or analysis tool.
|
This is a seminar-style class, which is primarily driven by in-class discussion of a research paper. Each
student is expected to submit a response to the paper being discussed in the class
at 4pm the day before the class in which it is discussed. Each student will also serve as discussion lead for 1-2
papers; the discussion lead will read through other students' responses and
prepare questions to start the discussion. Around half your course mark comes from
participating in discussions and submitting paper responses.
The bulk of your course mark will come from an open-ended course project. There are two options for the course project:
- Research project: propose and conduct a small research project. As it is within the scope of a single academic semester, it may not be a ``full'' project, but more of a preliminary investigation. The project should be structured give you some experience with research implementation, evaluation, or data analysis.
- Open-source contribution: this year I am offering the option to use a contribution to open-source software as the course project. Definitely in scope are contributions that help fuzz testing, software correctness, and software security more broadly. The project should be structured to give you experience on software project planning and implementation.
If you are doing a research project, it is strongly encouraged be conducted as
a team of 2-4 students. Students may ask for exceptions to conduct individual projects. If you are doing an open-source contribution, I suggest a team of 1-2 students, depending on the size of the contribution.
The project deliverables include an initial proposal, a project check-in, an in-class presentation of the work, and a
final report summarizing what has been achieved.
Note this structure is quite distinct from most undergraduate courses which are heavily structured and assessment-driven. The open-ended nature of
the responses and the project can be somewhat unsettling if coming straight from heavily structured courses. The goal of this course
is to expose students to the field of automated testing, and foster their research abilities, as outlined in the learning goals above.
Grading (Subject to Change)
- Paper Responses: 22.5%
- In-class Participation: 15%
- Discussion Lead: 7.5%
- Project Proposal: 12.5%
- Project Check-in: 2.5%
- Project Presentation: 10%
- Project Writeup: 30%
|
Schedule ^
Note (July 2025): this schedule is provided for your information, to give you an idea of what we might cover! We will likely have fewer papers in the 2025W2 semester, and the order of papers might change.
|
Course Overview, Introductions
Slides from class.
Suggested Reading: Sections 1, 1.1 (not 1.1.1, 1.1.2), 1.2 (not 1.2.1), Caroline's Dissertation.
Optional Reading (do not write a response): How
to Read a Paper, Srinivasan Keshav.
Sign up for course piazza:.
|
DL: Caroline
|
|
Random (a.k.a. Blackbox) Fuzzing
Barton P. Miller, Mengxiao Zhang, and Elisa R. Heymann. 2022. The Relevance of Classic Fuzz Testing: Have We Solved This One?
IEEE Trans. Softw. Eng. 48, 6 (June 2022), 2028–2039.
|
DL: Caroline
|
|
Coverage-Guided (Greybox) Fuzzing
Andrea Fioraldi, Alessandro Mantovani, Dominik Maier, and Davide Balzarotti. 2023.
Dissecting American Fuzzy Lop: A FuzzBench Evaluation. ACM Trans. Softw. Eng. Methodol. 32, 2, Article 52 (March 2023), 26 pages.
Suggested Reading: Section 2.1, Caroline's Dissertation.
|
DL: Caroline
|
|
Property-Based Testing (aka Generator-Based Fuzzing).
Koen Claessen and John Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the fifth ACM SIGPLAN international conference on Functional programming (ICFP '00). Association for Computing Machinery, New York, NY, USA, 268–279.
|
DL:
|
|
Combining Coverage-Guided and Generator-Based Fuzzing.
Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. 2019. Semantic fuzzing with zest. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 329–340.
slides
|
DL:
|
|
Symbolic Execution
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX conference on Operating systems design and implementation (OSDI'08). USENIX Association, USA, 209–224.
|
DL:
|
|
Dynamic Symbolic Execution (Concolic Execution)
Koushik Sen, George Necula, Liang Gong, and Wontae Choi. 2015. MultiSE: multi-path symbolic execution using value summaries. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA, 842–853.
|
DL:
|
|
Neural Networks + Fuzzing
D. She, K. Pei, D. Epstein, J. Yang, B. Ray and S. Jana. 2019.
NEUZZ: Efficient Fuzzing with Neural Program Smoothing.
2019 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA, pp. 803-817.
|
DL:
|
|
Neural Networks + Fuzzing Continued
Maria-Irina Nicolae, Max Eisele, and Andreas Zeller. 2023.
Revisiting Neural Program Smoothing for Fuzzing.
In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 133–145.
|
DL:
|
|
Beating Magic Bytes.
Cornelius Aschermann, Sergej Schumilo, Tim Blazytko, Robert Gawlik, Thorsten Holz. 2019.
REDQUEEN: Fuzzing with Input-to-State Correspondence.
In Proceedings of Network and Distributed System SecuritySymposium, NDSS'19.
|
DL:
|
|
Evaluating Randomized Algorithms.
Andrea Arcuri and Lionel Briand. 2011.
A practical guide for using statistical tests to assess randomized algorithms in software engineering.
In Proceedings of the 33rd International Conference on Software Engineering (ICSE '11). Association for Computing Machinery, New York, NY, USA, 1–10.
Project Proposal Due Oct 11th, 5pm, by email.
|
DL:
|
|
Whole Test Suite Generation
Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007.
Feedback-Directed Random Test Generation.
In Proceedings of the 29th international conference on Software Engineering (ICSE '07). IEEE Computer Society, USA, 75–84.
|
DL:
|
|
Test Suite Generation + LLMs
Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen. 2023.
CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-Trained Large Language Models.
In Proceedings of the 45th International Conference on Software Engineering (ICSE '23). IEEE Press, 919–931.
|
DL:
|
|
Test Oracle Generation
Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, and Shuvendu K. Lahiri. 2022.
TOGA: a neural method for test oracle generation.
In Proceedings of the 44th International Conference on Software Engineering (ICSE '22). Association for Computing Machinery, New York, NY, USA, 2130–2141.
|
DL:
|
|
Project Work Day
|
|
|
Pattern-Based Static Analysis
David Hovemeyer and William Pugh. 2004.
Finding bugs is easy. SIGPLAN Not. 39, 12 (December 2004), 92–106.
|
DL:
|
|
Type-Annotation Static Analysis
Martin Kellogg, Vlastimil Dort, Suzanne Millstein, and Michael D. Ernst. 2018.
Lightweight verification of array indexing.
In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018). Association for Computing Machinery, New York, NY, USA, 3–14.
|
DL:
|
|
Delta-Debugging
Andreas Zeller and Ralf Hildebrandt. 2002.
Simplifying and Isolating Failure-Inducing Input.
IEEE Trans. Softw. Eng. 28, 2 (February 2002), 183–200.
|
DL:
|
|
Fault Localization.
James A. Jones, Mary Jean Harrold, and John Stasko. 2002. Visualization of test information to assist fault localization.
In Proceedings of the 24th International Conference on Software Engineering (ICSE '02). Association for Computing Machinery, New York, NY, USA, 467–477.
Project Check-in Due Nov 8th, 5pm, by email.
|
DL:
|
|
Midterm Break. No Class.
|
|
|
Project Work Day
|
|
|
Compiler Testing
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011.
Finding and understanding bugs in C compilers.
In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11). Association for Computing Machinery, New York, NY, USA, 283–294.
|
DL:
|
|
Compiler Testing in the Era of LLMs
Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024.
Fuzz4All: Universal Fuzzing with Large Language Models.
In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE '24). Association for Computing Machinery, New York, NY, USA, Article 126, 1–13.
|
DL:
|
|
Bug Detection in Concurrent Programs
Robert O'Callahan and Jong-Deok Choi. 2003.
Hybrid dynamic data race detection.
In Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '03). Association for Computing Machinery, New York, NY, USA, 167–178.
|
DL:
|
|
Testing Programs that Interact with Network
Sergej Schumilo, Cornelius Aschermann, Andrea Jemmett, Ali Abbasi, and Thorsten Holz. 2022. Nyx-net: network fuzzing with incremental snapshots. In Proceedings of the Seventeenth European Conference on Computer Systems (EuroSys '22). Association for Computing Machinery, New York, NY, USA, 166–180.
|
DL:
|
|
Project presentations
|
|
|
Project presentations
|
|
|
Academic Honesty ^
You will receive zero on any written submission (paper response, proposal, final write-up) that is found to have plagiarised materials. This includes even a single plagiarized sentence. This includes plagiarism from other students, as well as from other research papers. Refer to this page for more details on the definition of plagiarism, and tips on how to avoid it.
There is zero-tolerance for the use of GenAI to generate written materials for the course, including paper responses, project proposals, etc. I.e., writing submitted for the course should be generated by a human. I will always favour short, clear, human-written text, over long amounts of GenAI-written text. You are free to use GenAI for coding tasks in your course project; however, you will need to be able to assess whether the coding task was implemented correctly.
|
Paper Responses ^
Read each paper to the best of your ability before drafting your paper response. There are two goals to the paper responses:
- Convey your understanding of the paper's core innovations
-
Convey your opinion of the paper, in order to spark in-class discussion
Structure your paper responses as:
-
A short paragraph summarizing the paper. What is the problem being tackled? How was it adressed by prior work? What are the innovation(s) proposed in this paper? How are those innovations evaluated?
-
Your opinion of the paper, which can be structured in a paragraph or bullet points. Some questions to guide you as you read and write your response:
-
What remains unclear after reading the paper? Are there any clarification questions whose answers would substantially change your opinion of the paper?
-
How does the paper's evaluation match with the proposed problem statement?
-
Do you forsee any barriers to the applicability of the technique proposed in the paper? If so, how could these barriers be overcome?
-
Which technical innovations are most compelling to you?
-
Which problems remain unsolved after this paper?
Remember, the responses help spark in-class discussion. Thus, specificity and depth in your responses is useful.
A detailed description of a particular issue raised while reading the paper will be more useful than several high-level comments.
For example, if you have doubts about the scalability of a particular technique, detail specific examples that raise those doubts,
rather than simply saying "I think the technique will not scale".
Similarly, do not use GenAI to create your paper response. It is much more useful to see a response that mentions uncertainty or confusions you faced while reading the paper, than a vague summary of the paper with only high-level questions being asked. Also remember in-class participation is 15% of your course grade: this is measured by your spoken contributions during class. It is easier to make a contribution to the discussion if you have engaged with the paper yourself!
Paper responses must be submitted to the course Piazza at 4:00pm before the class on which they are due.
Thus, for Tuesday readings, your paper response should be submitted before 4:00pm on Monday. For Thursday readings,
they should be submitted before 4:00pm on Wednesday. This is to give the discussion lead ample time to read the paper
responses from other students.
Your paper responses will be submitted as follow-up discussions to each paper's post on the course Piazza.
You should not read other students' response before writing your own. After posting your response, you may engage with other students' reponses, e.g. answering a clarification question someone else had.
|
Project ^
The course project constitutes the majority of the course grade. We will have a project proposal day ahead of the actual projects being due, so you can hear what fellow students are thinking of working on, propose your project, and form groups.
Research Project Option
The purpose of the project
is to give you the opportunity to conduct research in the field of program analysis, debugging, or automated testing, and develop research skills (literature review, experimentation, etc.).
I recommend you do the project in groups of 2-4; this will give you the opportunity to conduct a project with broader impact.
The scope of acceptable project topics is broad. You are encouraged to find a project with intersects with your
own research interests or topic of specialization. But, your project should touch topics in debugging, program analysis, or
testing. You may consider a project where you: develop a new tool for testing/analyzing programs in a particular domain,
tweak an existing algorithm, conduct an extensive re-evaluation of an existing tool, reimplement an
algorithm in a new domain, or create a benchmark suite.
If you are not sure if a project is in scope, feel free to set up a meeting with me prior to submitting the project proposal.
Project Ideas. Here are some sample project ideas, of varying complexity and scope. You need not restrict yourself to these ideas. Feel free to ask me for more details on these.
- Compare the performance of CodaMOSA to a hinting strategy that pulls example code snippets from the existing project code base (rather than invoking an LLM).
- Build a dataset of oss-fuzz found bugs and fixes. Evaluate an existing program repair tool on this dataset.
Previous Year's Projects. Here are some projects from previous years:
- Implementing JaCoCo as a coverage backend to JQF.
- Creating a tool to find NaN poisoning exploits in C/C++. Built on LLVM tooling.
- Creating an automated testing tool (based on property-based testing) for Haskell Rewrite Rules.
Open-Source Contribution Option
In this version of the project, you will develop more software development skills. You will need to identify a relevant issue to fix, estimate the work necessary to conduct this fix, propose success metrics for your fix, conduct the improvements, and evaluate your project around your previously-proposed success metrics.
Propose a contribution to open-source software that improves software correctness/security/performance. Here are a few ideas:
- Find an open issue in a software repository of interest.
- oss-fuzz contributions: e.g. fix an oss-fuzz found vulnerability; integrate a new project into OSS-Fuzz (you'll want more than 1 person for this!)
- security-improving patches of existing software: see the page for details.
- Address a bug you have found, in software you depend on for your research!
- ...some other idea, discuss with Caroline.
In proposing doing a contribution you should clearly describe the problem you want to solve as well as your success metrics . Will you write diverse test cases that currently do not pass, and ensure they pass after your improvements? Or a regression test suite if you are making a security/performance improvement that should not have an effect on the software? Perhaps the issue has suggested success metrics already? If the scope of the contribution is large, consider intermediate success metrics.
At the end of term, evaluate your project against these success metrics. If any unexpected issues made you unable to meet these metrics. If you have met all your success metrics, and are confident the contribution improves the state of the project (multiple team members can help with this!), the contribution can be submitted to maintainers before the end of the course.
|
Instructions for Discussion Leads ^
You will serve as discussion lead for one paper in the class. As a discussion lead,
you do not need to submit a response to the paper being read on Piazza. However, you
should read the paper in detail, as you will be the "goto" person for clarification
questions during discussion. Read through other students' responses, and identify
common themes and interesting questions.
Prepare the following materials.
- A 10-15 minute summary of the key technical contributions of the paper. You
may make a slide deck or use the whiteboard. Caroline will by default project
the paper for reference during discussion.
-
Identify 3 common themes, future work suggestions, or points of contention from
your reading and the other students' responses. Be prepared to briefly explain these to jump-start
discussion.
-
Identify any clarification questions from your own reading of the paper and other
students' reponses. If you are not able to fully clarify some of the questions, feel
free to point this out! Collect the parts of paper/related work you had difficulty
understanding. We can work together as a group to clarify the questions.
|
This course is loosely based off of Koushik Sen's 2016
offering of CS 294-15.
|