- What software should you choose?
- What functionalities should you look for?
- Available software
Linguists use software primarily to accomplish some professional goal, such as data analysis. In this context, it is tempting to focus on immediate goals only; issues such as interface, availability to the analysis at hand, and ease of printing claim unwarranted prominence. While these issues are indeed important, other considerations are more essential to the long-term intelligibility of the data.
The most important consideration to be made is posterity; the language being documented may not be spoken in another century, or even within another decade. The material you produce will then be the only remnants of the language. It is vital to consider whether software will produce documentation which is both open in format and in general use, and thus likely to be easily reusable by later scholars. This means that the format the software writes must be an open standard, and interpretable by many different pieces of software, not just the one that wrote it (Bird and Simons 2003).
This page details some considerations to bear in mind when choosing linguistic software, as well as links to recommended linguistics software available online.
- Use software which exports documentation and description in formats that are open (i.e. whose specifications are published and non-proprietary).
- Prefer software which outputs formats supported by software tools available from multiple suppliers.
- Where possible, prefer tools which are free over those from commercial suppliers.
- If you must use proprietary software, prefer software which outputs published proprietary formats, e.g. Adobe Portable Document Format (PDF) and MPEG-1 Audio Layer3 (MP3), to secret proprietary formats (e.g. most formats produced by Microsoft products).
- Prefer software that generates XML (with an accompanying DTD or Schema) over other schemes of descriptive markup.
Before choosing software, it is important to decide which functonalities are most essential: no software does it all. We know of no single piece of software, for example, which allows you to link stretches of an audio or video file to text and which also allows analysis of richly structured data.
Note: The need to use more than one tool to secure the results you want is an additional argument for storing data in open formats. Text files with XML markup are the easiest to import and export.
The Berkeley Initiative for Computationally Assisted Linguistics (BIFoCAL) has provided a useful list of functionalities to consider and advice on choosing software, from which the following information is drawn.
- Data archiving: What is the primary storage format? What formats can the data be exported in? [Look for the ability to export in plain text, with XML markup.]
- Character formats: Simple ASCII transcription (no special characters)? Unicode support? [Look for Unicode support if you need more than the standard set of English characters.]
- Data preparation: Input environment (laptop? desktop? web-based input?), Collaboration, Parsing/Interlinearizing, Alignment of text to audio and/or video recordings, Multiple glossing languages
- Data analysis: Selection of specific subset of data, Sorting, Searching, Concordancing, Counting (fine-grained numerical analysis), Comparison with data from another source. [Many pieces of software support limited subset selection, sorting, searching, and counting; but few, if any, have sophisticated capabilities in all these areas.]
Linguistics software is being developed and improved all the time, with increasing recognition of the BP guidelines. The school's toolroom contains a searchable database of linguist-reviewed software, assessed according to their conformity with BP.
Click here for more information about the computer software that is available to help you achieve Best Practice in your work.