Coreference Resolution

Identifying references to the same thing

Crowdsourcing and Data Annotation

Including non-experts in data creation and system functions


Playing the board game Diplomacy

Executable Semantic Parsing

Generating code that represents the meaning of text

Language Models

Work on making and using vector representations of text

Old Blog Posts

Blog posts from my old website


Various topics

I’m trying a new approach to reading literature. I try to read enough of one paper each work day to get the key idea and then add some content here if I want to remember it. My hope is that this helps me get more out of my reading by forcing me to identify what mattered most to me in the paper and to link it to other things I have read.

Note, this is not a literature review. I am not aiming to be comprehensive. The papers I read and write about reflect my interests, biases, and opinions. I also don’t completely summarise the work, but rather focus on the aspects that I want to remember. The pages are also in various states. Some are fairly detailed, others are quite sparse or contain just a list of papers I plan to read / reread and write about.

Reading papers

Advice from elsewhere:

To help me identify the papers I want to read, I have been using the following method (in Chrome on macOS):

  1. Go through the proceedings for a conference on the ACL anthology and read every title. Based on the title, decide whether to read the abstract. Based on the abstract, decide whether to read the introduction, in which case open the paper in a tab.
  2. Bookmark all tabs. Either use Shift+Command+D or Bookmarks -> Bookmark All Tabs.
  3. Export the folder of bookmarks to a file. To do this, go to chrome://bookmarks, select the new folder then use the menu on the far right of the blue bar to select Export Bookmarks.
  4. Run the code below, with bookmarks_DATE.html as input (note, requires PyPDF2). This produces a pdf with only the introduction of each paper (approximately).
  5. Read through the pdf this produces and flag the papers to read all of.
# Get the paper URLs
import sys
papers = {}
for line in sys.stdin:
    if '' in line:
        content = line.strip()
        url = content.split()[1].split('"')[1][:-1] + ".pdf"
        name = content.split(" - ACL Anthology")[0].split(">")[-1]
        papers[name] = url

# Download the papers
import io, requests
PDFs = {}
for name, url in papers.items():
    r = requests.get(url, auth=('usrname', 'password'), verify=False,stream=True)
    assert 200 <= r.status_code < 400
    r.raw.decode_content = True
    PDFs[name] = io.BytesIO(r.content)

# Get the Introductions
from PyPDF2 import PdfFileReader, PdfFileWriter
import string
pdf_writer = PdfFileWriter()
for name, raw_pdf in PDFs.items():
    pdf = PdfFileReader(raw_pdf)
    page0 = pdf.getPage(0)
    text = page0.extractText().split('\n')
    done = False
    for part in text:
        # Try to find the start of section 2
        if part.startswith('2') and len(part) > 1:
            if part[1] in string.ascii_letters:
                done = True
    if not done:
        page1 = pdf.getPage(1)
        start = page1.extractText().split('\n')[0]
        # Try to find the start of section 2
        if start.startswith('2') and len(start) > 1:
            if start[1] in string.ascii_letters:
                done = True
        if not done:

with open('example.pdf', 'wb') as out: