Project

Contents

Part 1: Project proposal

The first step is to choose the topic you’d like to explore in your final project. Many types of projects are acceptable, though typical projects involve building a computational model, applying a model to a novel language task, or evaluating existing models.

Initial pitch

Due Tuesday, November 11 at 11:59 p.m.

To prepare for class on November 12, write a one-paragraph project pitch, simply describing the goal of your project – what do you want to do? What’s the research question?

I have some advice about what makes for a good project idea!

The initial proposal should be individual. In class, you’ll be able to discuss project ideas with the instructor and other students, and you’ll have the opportunity to form groups of up to three students who are interested in working on the same project. (You can then choose to just develop one student’s proposal or to combine proposals.)

Task: Submit your proposal pitch PDF on Gradescope.

Full proposal

Due Sunday, November 16 at 11:59 p.m.

Each group will submit a more fleshed-out project proposal (1–2 pages), briefly addressing the following questions:

  • What is the goal of your project? (Try to formulate a specific research question you’re attempting to answer!)
  • What data do you plan to use?
  • What experiments do you want to run?
  • Are there other models or resources you plan to build on?
  • How will you evaluate your success?

You should also include a brief timeline for completing the components of the project you propose.

Note: I suggest designing your project to have an intermediate milestone, e.g., if you project involves training a complex neural model, consider a baseline model you could try first, like logistic regression.

Task: Submit your full proposal PDF on Gradescope.

Part 2: Literature review

Draft with related work section

Due Wednesday, 19 November at 1:30 p.m.

Most research papers include a section called Related Work, which, when executed best, describes scholarly works that are closely related to the current project and how that project differs. This includes projects solving the same problem in a different way, projects solving slightly different problems but using a similar strategy, projects on which your project builds, etc.

I recommend engaging substantially with 3–4 papers, though you’re free to mention others. You don’t have to thoroughly read every paper you cite, but you should know enough about them to be able to succinctly say how what you’re doing is similar and different to what they did. While I expect many of the related works will come from NLP papers, I also expect some may come from other domains, like linguistics, gender studies, or political science.

To compile these papers, Semantic Scholar, Google Scholar, and the ACL Anthology are helpful starting points. If you find important papers, textbook chapters, or pages referencing this topic, I’d encourage you to look at the bibliographies of those papers to find out what the core papers are that people cite in this subfield. If you’re not sure where to start, the textbook or Wikipedia page may give you some starting paper links, but don’t let them be your main resource; go find primary sources!

For this deadline, your draft should include:

  • Project title,
  • Introduction section (based on your proposal),
  • Related Work section, and
  • References section.

You should follow the general format for an Association for Computational Linguistics (ACL) conference paper. I recommend using LaTeX – you can use this template – but you can use Google Docs or Microsoft Word if you prefer. If you use LaTeX, you should use BibTeX to store information about the references and cite them in the text – see Overleaf’s documentation.

Task: Submit a PDF (as a team, one per project) on Gradescope.

Draft with related work and methods sections

Due Wednesday, 26 November at 11:59 p.m.
I didn’t have you turn this in, though – hopefully – you fleshed out your paper draft more after the in-class peer review.

In a standard experimental research paper, citations are not only found in the Related Work section; they’re scattered throughout the paper, as they help motivate the introduction, describe the evaluations, and contextualize the results of an experiment.

Based on your project proposal and the literature review you began for the Related Work section, I’d like you to write out a citation-enriched plan of what you’re going to use to assemble your project, including models, evaluations, libraries, datasets, and published strategies. Note that your job here isn’t to justify why these choices are the right ones; it’s to document where those choices came from in the existing literature (which may actually turn out to be all the justification you need).

Try to find primary sources if you can. For instance, if your project uses tf–idf, it’s more appropriate to cite the original Salton & McGill (1986) paper rather than the description from Jurafsky & Martin. Similarly, for many datasets, there’s an associated paper that introduces the dataset, and it’s appropriate to cite that paper when you first refer to the dataset in your paper; if no such paper exists, however, it’s okay to put the URL in a footnote to indicate where it came from.

For this deadline, your draft should include:

  • Project title,
  • Introduction section,
  • Related Work section,
  • Methods section, and
  • References section.

Part 3: Presentation

In class, December 3 and 8

To present the core problem you’re working on, your approach, and a little of what you’ve done so far, you’ll give a 5–6 minute in-class presentation describing your project. (Teams of three or more can have an extra two minutes, or 8 minutes total.)

This isn’t a lot of time, so you’ll want to quickly get to the core research question you’re addressing and idea of what results you have so far.

Your presentation will be graded on the following:

Problem statement (40%)

Does your presentation clearly establish what the problem or research question is that your project addresses? This should be concrete and something for which you can provide evidence: a guiding question like “how does gender affect translation” isn’t a concrete research question, but “how does signaling speaker gender to a machine translation system affect the quality of translations” is.

Background (20%)

Is it clear what existing work you’re building off of or comparing to? You don’t have to mention all the related work, but it’d be good to mention a couple of close neighbors to your project or historical context for your project so someone can understand your specific contribution.

Progress (20%)

Is it clear what you’ve done so far, and what’s left to do? This can be a combination of initial results (with tables and/or plots) and a description of steps left to take.

Note: It can be tempting to spend a while detailing how hard it is proving to be to set a thing up, but it’s difficult for your colleagues and I to learn much about your actual project content from that. It’s okay to mention things that don’t work, but please focus on the things you do have working.

Slides (20%)

Do the slides help communicate your ideas in a clear and effective way? Good slides give enough information to visualize or support the argument you’re making, but won’t necessarily have text for everything you say – in fact, many of my slides for technical presentations have no text at all.

A good rule for slide decks is that most people average about a minute of speech per slide. You should include some sort of plot or figure in your slides showing at least a partial result to get full points.

Part 4: Paper and code

Draft due Wednesday, 10 December at 11:59 p.m.
Submitting a near-final draft by the last day of classes ensures that you don’t miss the hard deadline for final submissions and you have the time to ask questions and address any issues you notice.

Due Sunday, 14 December at 11:59 p.m.
Per Vassar regulations, no work can be accepted after this deadline without special permission from the Dean of Studies office.

Paper

As with the literature review, your final paper should follow the basic format of an ACL conference paper (here’s the template again). Using this format, the final paper should be 3–5 pages long (not counting the references or any appendices). This is longer than you might imagine if you’re used to papers being double-spaced with a 12-point font! Actual ACL short papers are 4 pages of content, which is long enough to concisely report serious research.

Your report should clearly express your approach to addressing a clear research question, how your approach connects with and differs from previous approaches, and both quantitative and qualitative analysis of your results. To match the ACL style, it should also include a (short!) abstract that states in 3–5 sentences what problem you addressed and what a key finding was.

The paper will be graded on the following:

Introduction (10%)
  • Is the research question clearly explained?
  • Is the task clearly articulated, with examples where helpful?
Background/Related Work (15%)
  • Is the research situated with respect to previous work?
  • Is previous work cited properly?
  • Is it clear how your work differs from or builds on existing work?
Methods (25%)

The balance of this section will vary depending on your project, but it should address the following as appropriate:

  • Is the data clearly explained, and was care taken to select high-quality data appropriate to the task?
  • Is the model or approach clearly explained and appropriate to the task?
  • Are design decisions explained clearly?
Results (30%)
  • Is the evaluation metric clearly explained and appropriate to the task?
  • Is the discussion of performance clear and thorough?
  • Is performance contextualized appropriately by discussing baselines and/or previous work?
  • Are trends highlighted and discussed?
  • Are there visualizations of results (tables, figures, etc.)?
  • Does the analysis go beyond “did it work” to address “why did/didn’t it work”?
Conclusion, Limitations & Ethical Considerations (10%)
  • Are the findings summarized concisely?
  • Are limitations of the work discussed?
  • Are ethical considerations addressed, including potential impacts and any concerns related to data, methods, or applications?
General (10%)
  • Is the report well-organized and easy to follow?
  • Has it been proofread?

Codebase

In addition to the paper, you’ll submit the code (and data) you use for your project. Your code should be clearly organized, and it should be documented, both in the code itself (with docstrings and/or comments, as appropriate) and in a README text file that describes, at a minimum, how to run your code.

The code will be graded on the following criteria (adapted as appropriate to your project):

  • Does the project involve a substantial engineering effort?
  • Does the code successfully run the models?
  • Is the evaluation metric appropriate?
  • Does the code evaluate model performance?
  • Is the code commented and organized?
  • Is there a README that describes how to run the code?