Topics in Programming Languages: Automated Testing, Bug Detection, and Program Analysis

CPSC 539L, Winter Term 2 2025 (Jan-Apr 2026), Instructor: Caroline Lemieux (clemieux@cs.ubc.ca)

Tu/Th 11:00AM-12:30PM, SWNG 210

Office hours by appointment, email Caroline (clemieux@cs.ubc.ca)

Quick Links

Format • Schedule • Academic Honesty • Responses • Project • Discussion Leads

Course description ^{^}

Software bugs remain pervasive in modern software systems. As software becomes increasingly intertwined in all aspects of our lives, the consequences of these bugs become increasingly severe. For instance, since the mid 2010s, several important security vulnerabilities have emerged from basic correctness bugs in software (e.g. Heartbleed, Cloudbleed, Shellshock ). Though not all bugs cause important security vulnerabilities, their overall cost remains high: one estimate puts the cost of dealing with software failures and defects in the United States alone in 2018 at nearly USD$1,540,000,000---the worldwide impact of these defects is even more severe.

While human-written tests are a prevailing method for assuring the quality of software in practice, they are still subject to human error, and allowing bugs to slip through the cracks that the human tester did not consider. This class will explore topics in the field of automated testing, which aims, at a high-level, to reduce the developer burden in discovering bugs and writing test cases. Recent advances in this field leverage static and dynamic information from the program under test in order to effectively explore the space of behaviors of the program under test, enabling the quick discovery of potentially severe security bugs.

The concerns of scalable bug detection and program debugging require innovation from multiple different fronts. As such, this course will cover papers from Software Engineering, Programming Languages, Security, and Systems venues.

Course-level learning goals

After the course, participants should be able to:

Describe the problems solved by the following techniques, recognize the scenarios in which each technique can be used, and analyze the pros and cons of each technique, for the following techniques:
- Test-input generation: blackbox fuzzing, greybox fuzzing, symbolic execution
- Test oracles: crashing oracles, differential oracles, automated test-oracle generation
- Property-based testing
- Whole test-suite generation
- Dynamic Data Race Detection
Identify contributions and room for improvement in existing research papers;
Debate the positives and negatives of existing research papers;
Categorize and summarize concerns and questions about existing research papers
Design and conduct the experimental evaluation of a program testing or analysis tool; or Design, plan, and implement an open-source contribution.

Format ^{^}

This is a seminar-style class, which is primarily driven by in-class discussion of a research paper. Each student is expected to submit a response to the paper being discussed in the class at 4pm the day before the class in which it is discussed. Each student will also serve as discussion lead for 1-2 papers; the discussion lead will read through other students' responses and prepare questions to start the discussion. Around half your course mark comes from participating in discussions and submitting paper responses.

The bulk of your course mark will come from an open-ended course project. There are two options for the course project:

Research project: propose and conduct a small research project. As it is within the scope of a single academic semester, it may not be a ``full'' project, but more of a preliminary investigation. The project should be structured give you some experience with research implementation, evaluation, or data analysis.
Open-source contribution: this year I am offering the option to use a contribution to open-source software as the course project. Definitely in scope are contributions that help fuzz testing, software correctness, and software security more broadly. The project should be structured to give you experience on software project planning and implementation.

If you are doing a research project, it is strongly encouraged be conducted as a team of 2-4 students. Students may ask for exceptions to conduct individual projects. If you are doing an open-source contribution, I suggest a team of 1-2 students, depending on the size of the contribution.

The project deliverables include an initial proposal, a project check-in, an in-class presentation of the work, and a final report summarizing what has been achieved.

Note this structure is quite distinct from most undergraduate courses which are heavily structured and assessment-driven. The open-ended nature of the responses and the project can be somewhat unsettling if coming straight from heavily structured courses. The goal of this course is to expose students to the field of automated testing, and foster their research abilities, as outlined in the learning goals above.

Grading (Subject to Change)

Paper Responses: 15%
In-class Participation: 20%
Discussion Lead: 12%
Project Proposal: 9%
Project Check-in: 2%
Project Presentation: 12%
Project Writeup: 30%

Schedule ^{^}

Tue, Jan 6th	Course Overview, Introductions Suggested Reading (do not write a response): Sections 1, 1.1 (not 1.1.1, 1.1.2), 1.2 (not 1.2.1), Caroline's Dissertation. Optional Reading (do not write a response): How to Read a Paper, Srinivasan Keshav. Sign up for course piazza: https://piazza.com/ubc.ca/winterterm22026/cpsc539l2025w2.	DL: Caroline
Thu, Jan 8th	Random (a.k.a. Blackbox) Fuzzing Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilities. Commun. ACM 33, 12 (Dec. 1990), 32–44. Suggested Reading: Section 2.1, Caroline's Dissertation.	DL: Caroline
Tue, Jan 13th	Compiler Testing Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11). Association for Computing Machinery, New York, NY, USA, 283–294.	DL:
Thu, Jan 15th	Compiler Testing in the Era of LLMs Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Fuzz4All: Universal Fuzzing with Large Language Models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE '24). Association for Computing Machinery, New York, NY, USA, Article 126, 1–13.	DL:
Tue, Jan 20th	Beating Magic Bytes. Cornelius Aschermann, Sergej Schumilo, Tim Blazytko, Robert Gawlik, Thorsten Holz. 2019. REDQUEEN: Fuzzing with Input-to-State Correspondence. In Proceedings of Network and Distributed System SecuritySymposium, NDSS'19.	DL:
Thu, Jan 22nd	Combining Coverage-Guided and Generator-Based Fuzzing. Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. 2019. Semantic fuzzing with zest. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 329–340.	DL: Caroline
Tue, Jan 27th	Whole Test Suite Generation Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-Directed Random Test Generation. In Proceedings of the 29th international conference on Software Engineering (ICSE '07). IEEE Computer Society, USA, 75–84.	DL:
Thu, Jan 29th	Test Suite Generation + LLMs Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen. 2023. CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-Trained Large Language Models. In Proceedings of the 45th International Conference on Software Engineering (ICSE '23). IEEE Press, 919–931.	DL:
Tue, Feb 3rd	Project pitches Students will do a short "chalk talk" of their project ideas. Rest of the class will be devoted to talking to other students about their projects to form teams, or talking to Caroline to refine the project ideas	DL: All
Thu, Feb 5th	Symbolic Execution Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX conference on Operating systems design and implementation (OSDI'08). USENIX Association, USA, 209–224.	DL:
Tue, Feb 10th	Inferring Test Oracles from Documentation Alberto Goffi, Alessandra Gorla, Michael D. Ernst, and Mauro Pezzè. 2016. Automatic generation of oracles for exceptional behaviors. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). Association for Computing Machinery, New York, NY, USA, 213–224.	DL:
Thu, Feb 12th	Generating Test Oracles using LLMs Soneya Binta Hossain and Matthew B. Dwyer. 2025. TOGLL: Correct and Strong Test Oracle Generation with LLMs. In Proceedings of the IEEE/ACM 47th International Conference on Software Engineering (ICSE '25). IEEE Press, 1475–1487. Project Proposal Due Feb 15th, 11:59PM, by email.	DL:
Tue, Feb 17th	Reading Week; no class, no reading.	DL:
Thu, Feb 19th	Reading Week; no class, no reading.	DL:
Tue, Feb 24th	Delta-Debugging Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Softw. Eng. 28, 2 (February 2002), 183–200.	DL:
Thu, Feb 26th	Pattern-Based Static Analysis David Hovemeyer and William Pugh. 2004. Finding bugs is easy. SIGPLAN Not. 39, 12 (December 2004), 92–106.	DL:
Tue, Mar 3rd	Synthesizing Program Input Grammars (Spec Mining) Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing program input grammars. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA, 95–110.	DL:
Thu, Mar 5th	So.. Do They Actually Mine Grammars? Bachir Bendrissou, Rahul Gopinath, and Andreas Zeller. 2022. "Synthesizing input grammars": a replication study. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA, 260–268.
Tue, Mar 10th	Code-Comment Inconsistency Detection Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /icomment: bugs or bad comments?/ In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles (SOSP '07). Association for Computing Machinery, New York, NY, USA, 145–158.	DL:
Thu, Mar 12th	Inconsistency Detection and Rectification with LLMs Rong et al. 2025 Guoping Rong, Yongda Yu, Song Liu, Xin Tan, Tianyi Zhang, Haifeng Shen, and Jidong Hu. 2025. Code Comment Inconsistency Detection and Rectification Using a Large Language Model. In Proceedings of the IEEE/ACM 47th International Conference on Software Engineering (ICSE '25). IEEE Press, 1832–1843.	DL:
Tue, Mar 17th	Project Check-in Due Mar 16th, 11:59PM, by email. Project Updates Students will do a short "chalk talk" of their project progress so far. Rest of the class will be devoted to talking to team members, talking to other students about what directiosn to go into, or talking to Caroline to refine the next steps of the project.	DL: All
Thu, Mar 19th	Project Work Day. No formal class. Open OH in the classroom.
Tue, Mar 24th	Bug Detection in Concurrent Programs Robert O'Callahan and Jong-Deok Choi. 2003. Hybrid dynamic data race detection. In Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '03). Association for Computing Machinery, New York, NY, USA, 167–178.	DL:
Thu, Mar 26th	Scaling Concurrency Bug Detection Guangpu Li, Shan Lu, Madanlal Musuvathi, Suman Nath, and Rohan Padhye. 2019. Efficient scalable thread-safety-violation detection: finding thousands of concurrency bugs during testing. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). Association for Computing Machinery, New York, NY, USA, 162–180.	DL:
Tue, Mar 31st	Topic TBD Based on Class Interest	DL:
Thu, Apr 2nd	Topic TBD Based on Class Interest	DL:
Tue, Apr 7th	Project presentations
Thu, Apr 9th	Project presentations

Academic Honesty ^{^}

You will receive zero on any written submission (paper response, proposal, final write-up) that is found to have plagiarised materials. This includes even a single plagiarized sentence. This includes plagiarism from other students, as well as from other research papers. Refer to this page for more details on the definition of plagiarism, and tips on how to avoid it.

There is zero-tolerance for the use of GenAI to generate written materials for the course, including paper responses, project proposals, etc. I.e., writing submitted for the course should be generated by a human. I will always favour short, clear, human-written text, over long amounts of GenAI-written text. You are free to use GenAI for coding tasks in your course project; however, you will need to be able to assess whether the coding task was implemented correctly.

Paper Responses ^{^}

Read each paper to the best of your ability before drafting your paper response. There are two goals to the paper responses:

Convey your understanding of the paper's core innovations
Convey your opinion of the paper, in order to spark in-class discussion

Structure your paper responses as:

A short paragraph summarizing the paper. What is the problem being tackled? How was it adressed by prior work? What are the innovation(s) proposed in this paper? How are those innovations evaluated?
Your opinion of the paper, which can be structured in a paragraph or bullet points. Some questions to guide you as you read and write your response:
- What remains unclear after reading the paper? Are there any clarification questions whose answers would substantially change your opinion of the paper?
- How does the paper's evaluation match with the proposed problem statement?
- Do you forsee any barriers to the applicability of the technique proposed in the paper? If so, how could these barriers be overcome?
- Which technical innovations are most compelling to you?
- Which problems remain unsolved after this paper?

Remember, the responses help spark in-class discussion. Thus, specificity and depth in your responses is useful. A detailed description of a particular issue raised while reading the paper will be more useful than several high-level comments. For example, if you have doubts about the scalability of a particular technique, detail specific examples that raise those doubts, rather than simply saying "I think the technique will not scale".

Similarly, do not use GenAI to create your paper response. It is much more useful to see a response that mentions uncertainty or confusions you faced while reading the paper, than a vague summary of the paper with only high-level questions being asked. Also remember in-class participation is 15% of your course grade: this is measured by your spoken contributions during class. It is easier to make a contribution to the discussion if you have engaged with the paper yourself!

Paper responses must be submitted to the course Piazza at 4:00pm before the class on which they are due. Thus, for Tuesday readings, your paper response should be submitted before 4:00pm on Monday. For Thursday readings, they should be submitted before 4:00pm on Wednesday. This is to give the discussion lead ample time to read the paper responses from other students.

Your paper responses will be submitted as follow-up discussions to each paper's post on the course Piazza. You should not read other students' response before writing your own. After posting your response, you may engage with other students' reponses, e.g. answering a clarification question someone else had.

Project ^{^}

The course project constitutes the majority of the course grade. We will have a project proposal day ahead of the actual projects being due, so you can hear what fellow students are thinking of working on, propose your project, and form groups.

Research Project Option

The purpose of the project is to give you the opportunity to conduct research in the field of program analysis, debugging, or automated testing, and develop research skills (literature review, experimentation, etc.). I recommend you do the project in groups of 2-4; this will give you the opportunity to conduct a project with broader impact.

The scope of acceptable project topics is broad. You are encouraged to find a project with intersects with your own research interests or topic of specialization. But, your project should touch topics in debugging, program analysis, or testing. You may consider a project where you: develop a new tool for testing/analyzing programs in a particular domain, tweak an existing algorithm, conduct an extensive re-evaluation of an existing tool, reimplement an algorithm in a new domain, or create a benchmark suite. If you are not sure if a project is in scope, feel free to set up a meeting with me prior to submitting the project proposal.

Project Ideas. Here are some sample project ideas, of varying complexity and scope. You need not restrict yourself to these ideas. Feel free to ask me for more details on these.

Compare the performance of CodaMOSA to a hinting strategy that pulls example code snippets from the existing project code base (rather than invoking an LLM).
Build a dataset of oss-fuzz found bugs and fixes. Evaluate an existing program repair tool on this dataset.

Previous Year's Projects. Here are some projects from previous years:

Implementing JaCoCo as a coverage backend to JQF.
Creating a tool to find NaN poisoning exploits in C/C++. Built on LLVM tooling.
Creating an automated testing tool (based on property-based testing) for Haskell Rewrite Rules.

Open-Source Contribution Option

In this version of the project, you will develop more software development skills. You will need to identify a relevant issue to fix, estimate the work necessary to conduct this fix, propose success metrics for your fix, conduct the improvements, and evaluate your project around your previously-proposed success metrics.

Propose a contribution to open-source software that improves software correctness/security/performance. Here are a few ideas:

Find an open issue in a software repository of interest.
oss-fuzz contributions: e.g. fix an oss-fuzz found vulnerability; integrate a new project into OSS-Fuzz (you'll want more than 1 person for this!)
security-improving patches of existing software: see the page for details.
Address a bug you have found, in software you depend on for your research!
...some other idea, discuss with Caroline.

In proposing doing a contribution you should clearly describe the problem you want to solve as well as your success metrics . Will you write diverse test cases that currently do not pass, and ensure they pass after your improvements? Or a regression test suite if you are making a security/performance improvement that should not have an effect on the software? Perhaps the issue has suggested success metrics already? If the scope of the contribution is large, consider intermediate success metrics.

At the end of term, evaluate your project against these success metrics. If any unexpected issues made you unable to meet these metrics. If you have met all your success metrics, and are confident the contribution improves the state of the project (multiple team members can help with this!), the contribution can be submitted to maintainers before the end of the course.

Deliverables

Proposal. Your project proposal should roughly follow this structure, adapting to the nature of your project (new tool vs. replication vs. benchmark creation vs. implementation):

Background. What is the problem domain you are investigating? Why is the problem important?
Intended approach. What, precisely, are you aiming to build or investigate in the project? What precise problem will you be solving?
Evaluation plan. If you are planning on running an experiment, how many benchmarks will you evaluate on? How long will you run your tool(s)? If you are implementing a contribution, how will you evaluate that your contribution works correctly
Division of tasks. If you are working in a group, what is your plan to divide work on the project?
Timeline. Determine a timeline you will follow to achieve your implementation and experimental goals, as well as to deliver the writeup + presentation.

Check-in. The check-in is nearly a month in to the project. It should outline the progress achieved so far. If the timeline's goals have not been reached so far, provide an updated timeline. If, in starting the project, a different research direction has emerged as the more interesting one to follow, the check-in provides a mechanism to update the proposal.

Presentation

Length. TBD, depending on number of groups.

Contents. The final presentation should convey:

Background + Motivation Why is this project important?
Key details of accomplished approach. If you built a tool/algorithm: what are the key contributions of the tool/algorithm you built? If you are conducting an empirical investigation/case study: what methodology did you adopt? If you made an open-source contribution: what does your contribution enable which was previously impossible?
Evaluation Results + Conclusion. How did you evaluate your approach? What were the results? What are the high-level takeaways of these results?

Grading. You will be graded on (1) communication effectiveness of the presentation; (2) the ability to answer questions, and (3) for groups, the fair division of the presentation amongst group members.

Writeup.

Length. Your final report should be in ACM format. Use the "sigconf" option, i.e. "\usepackage[sigconf]{acmart}" at the top of your LaTeX source. The report should not exceed 10 pages in this format; however, you may have appendices for additional figures/data. As a rough guideline, aim for at least: 5 pages total; 0.75 pages background/intro; 0.5 pages related work. The exact number of pages and length of each section will depend on the nature of your projects. For instance, projects that are improving an existing tool may spend more time talking about the details of the existing tool and the implementation details of an improvement, than on less-closely-related work.

Contents. Your final writeup should cover the following topics.

Background. What is the problem domain you are investigating? Or, what is the problem you intend to help with your implementation? Why is the problem important?
Related work. How does your approach compare and contrast with related approaches? Focus only on the details necessary to contextualize your approach in the literature. If building an open-source contribution, is your contribution inspired by any research projects, or any other open-source contributions/project?
Approach. What, precisely, did you build or investigate in the project?
Evaluation. What research questions will your evaluation answer? How does your experiment design help answer these questions? What are the results of the experiment, and how do those results inform the research questions? If building an open-source contribution, how have you ensured you have accomplished what you set out to do?
Discussion. Discuss any observations on the experiment which do not fit in the main evaluation, any goals that were not achieved, potential room for improvement.
Division of tasks. If you are working in a group, how did you divide work on the project? Include who wrote different sections of the writeup.

Grading. The final report will be graded on several points, including (not necessarily equally weighted, and grading may vary between team members):

the background provided is sufficient to understand the contribution
the related work is critically analyzed in order to contextualize the contribution
the overall project approach (as proposed and updated in the check-in) has been accomplished
the experimental methodology is sound, the analysis of the experiments has defined takeaways
the overall clarity of presentation

Instructions for Discussion Leads ^{^}

You will serve as discussion lead for two papers in the class. As a discussion lead, you do not need to submit a response to the paper being read on Piazza. However, you should read the paper in detail, as you will be the "goto" person for clarification questions during discussion. Read through other students' responses, and identify common themes and interesting questions.

Prepare the following materials.

A 10-15 minute summary of the key technical contributions of the paper. You may make a slide deck or use the whiteboard. Caroline will by default project the paper for reference during discussion.
Identify 3 common themes, future work suggestions, or points of contention from your reading and the other students' responses. Be prepared to briefly explain these to jump-start discussion.
Identify any clarification questions from your own reading of the paper and other students' reponses. If you are not able to fully clarify some of the questions, feel free to point this out! Collect the parts of paper/related work you had difficulty understanding. We can work together as a group to clarify the questions.

This course is loosely based off of Koushik Sen's 2016 offering of CS 294-15.