We use the term “shepherd” for the person ultimately keeping an eye on making forward progress on a particular text. They don’t have to do all the work or even the majority of the work, they just need to be willing to help move things along.

While we will probably expand things to include translation alignment, postagging, treebanking, sembanking, and more at some point, the initial milestones for a text are:

  1. get a clean version with a referencing scheme in our format
  2. lemmatise the text and resolve any ambiguities in the lemmatisation

Getting Started

Putting Together the Initial Text

  • check Scaife (https://scaife.perseus.org/library/) to see if the text you want to work on is already in Perseus/OGL
  • if so, find the repo in GitHub and get the relevant TEI XML
  • if not, you’ll need to track down an openly-licensed version (either public domain / CC0, cc-by, or cc-by-sa)
  • there may be quite a lot of correction to do (see Preparing an Open Apostolic Fathers for what we did for the Apostolic Fathers)
  • there may be scans available on archive.org or elsewhere or you may even need to scan it yourself. If you need any help with OCR, just ask in the #ocr channel
  • [James needs to expand this a lot but] get it in our standard “reference text-part” form, one line per referenceable text-part
  • set up a text-validator.toml and validate the text with text-validator (preferably set up a GitHub Action to do this on all commits and PRs)

Lemmatising the Text

  • [James still to write]