Transcription: machine or manual?

Transcribing research interviews is often thought of as an unfortunate chore. Machine-learning based transcription services have become popular because they automate the first step of the transcription process, enabling researchers to correct the first transcription draft and move on to coding and analysis more quickly.

When you use a transcription service, you are trusting that company’s current data privacy policy, and you are accepting the risk that the company may retain your data and change its policy in the future. The only way to ensure that your data is not accessed, processed, or sold by another organization is to store it locally.¹ It’s useful to consider how four common types of cloud business models shape the use of your data:

Type	Example	Revenue Source	Data Implications
Free use	Google	Surveillance-driven advertising	Content and metadata are analyzed and sold to the highest bidder
Subscription service	Otter.ai	Subscriptions from individuals & organizations	Content and metadata may be analyzed for the company’s use, or sold
LLM	ChatGPT	Venture capital	Data policies are rapidly changing and unstable
Enterprise software suite	Microsoft	Contracts with large-scale organization	Content and metadata are analyzed for the company’s use

Huge cloud platforms like Google process your data and sell these analytics. Subscription services like Otter.ai and Microsoft Office use your data to “train and improve” their services.² This means that they still retain access to your information, and they may share it with third parties at their discretion.³ LLMs work in a similar way: the main difference is that no company has established a sustainable business model for a LLM, so these companies may be more likely to weaken data privacy polices to pursue additional revenue sources. Most, if not all, of these companies use enterprise data services based in the United States, and any data stored on these servers may be accessed by the US government with a court warrant or subpoena. Even companies that claim that your data will remain in your country of residence may not be able to prevent US authorities from accessing it.⁴

The implications of these data use regimes will vary depending on the subject of your research.⁵ Researchers critical of hegemonic nation-state actors may wish to avoid using these tools because of security concerns over data access.⁶ Others may want to refrain from using these tools because these companies participate in surveillance economies.⁷ Regardless of your choice, you should be aware of the data privacy implications of these tools and receive express consent from participants to process their conversations with third-party software services.

Manual transcription and slow scholarship

My most recent experience with transcription was part of a project studying the place-making practices of queer and trans youth. My research did not demand stringent security practices, but surveillance capitalism and the digital geopolitics of cloud services give me the ick. So, I chose to transcribe my approximately 27 hours of interview transcripts manually. It took me about 2 hours to transcribe each hour of recording, or about 50 hours total.

I was happy to find that transcribing my own interviews gave me the chance to listen to my participants again and again, encouraging me to carefully decide how to interpret and represent their stories within the complex contexts of our conversations, their lives, and our shared world. Mountz et al. (2015)⁸ remind us that slowness can be an epistemic and ethical virtue (1237):

Given the chance to marinate, ideas ripen, often resulting in some of our most thoughtful, provocative, and important work. Good scholarship requires time: time to think, write, read, research, analyze, edit, and collaborate.

Of course, we do not always seem to have the time that good scholarship requires as we negotiate the decaying edifice of the neoliberal university. Organizing and collective action is necessary to address the root of the problem. Still, it is worth reflecting:

Do our own attitudes reproduce neoliberal logics of extraction and ‘productivity’?
What if we re-framed our imagination of transcription from a rote chore to a deeply engaged process of listening?

Transcription can be an artful exercise not unlike translation. People, after all, do not usually speak how they would write. The choices we make in how we represent participants’ speech — how we divide run-on sentences, correct errors, contextualize slang words, or translate characteristics of particular dialects — carry ethical and political weight. What we might gain by reclaiming the value of our labor and taking our research — including our transcription — slowly?

Setting up transcription tools for getting in the flow

Okay, so you want to transcribe manually. The first thing you should consider is the ergonomics of your computer,⁹ because you’ll probably be doing this for a while. Once you have your machine set up properly, you will need:

A word processor. I recommend Obsidian because of its simplicity, flexibility, and privacy.
An audio playback tool. VLC is especially useful because it can be configured to make transcription more efficient. It is also open-source, widely-supported, and does not collect your data. Neat.

I learned that I transcribe best when I can get into the flow of the conversation, and so I adjusted my transcription process to eliminate interruptions as much as possible. The most frequent reason I would take my hands off the keyboard was to pause or rewind the recording audio. Maybe I didn’t quite catch what the participant said, or I was thinking about how to break up a run-on sentence with many ‘ands’ and ‘buts’. So, I setup VLC with “global shortcuts” so I could control playback without taking my hands of the keyboard or leaving the document.

Here’s how:

Start VLC and open the preferences menu (tools --> preferences, or ctrl+p)
Open the hotkeys submenu
Add the following hotkeys in the “global” (rightmost) column:
- “Short backwards jump” ctrl+shift+j
- “Short forwards jump” ctrl+shift+k
- “Play/pause” ctrl+shift+space
- Optional: “faster” ctrl+shift+d
- Optional: “slower” ctrl+shift+f
Enable “always-on-top” view --> always on top
Shrink the playback window and tuck it in the corner of your screen
Caveat: you may need to play around with different shortcuts, as some can overlap with program- or system-level hotkeys for different computers.

This will let you control the playback without taking your hands off the keyboard while keeping a small playback window visible so you can monitor playback and note timestamps in your transcript. Open a file and adjust the playback speed to something manageable (either by hotkey or through the menu playback --> speed --> slower/faster), and you’re good to go! Don’t worry so much about typos or spelling accuracy on your first draft; it’s good practice to return later with fresh ears to review what you’ve written. Proofing your transcripts can usually be done at a faster playback speed.

It’s worth keeping a style guide as you go with notes about your transcription process. Consider:

How will I note timestamps in my transcript, so it’s easier to return later to check for accuracy?
Will I lightly edit responses to conform with written grammar, and if so, to what extent?
How will I divide speech into paragraphs?
Will I indicate more-than-verbal communication (e.g. laughs, non-lexical vocalizations, tone of voice) in my transcript? If so, how?
Will I use italics to indicate tonal shifts or emphasis? If so, under which circumstances?
Will I add context for slang or dialect characteristics for my audience? If so, will I do this in the interview text itself (e.g. [with brackets]) or will I do it in the discussion following the interview excerpt?

While you’re transcribing, remember to take breaks — every 15 or 20 minutes worked best for me. You may want to:

Rest your eyes & stretch your hands.
Note important passages by bolding sections of the transcription document or with timestamps in another document.
Write down some of the themes you’re noticing in the conversation.

The themes I wrote down during this stage actually became the first iteration of my codebook, and the passages I identified became some of the anchors for the analytical sections of my thesis. In this way, an intentionally slow transcription process can protect the security of your research and the privacy of your participants while enriching your research process and giving you the opportunity to listen more deeply to your participants. Good luck!

Regardless of how you choose to store your data, you should make frequent backups! A good rule of thumb is that you should have 3 different backups on at least 2 different storage media, with at least one off-site. For example, you might have one backup copy on your laptop (make sure your device is encrypted!), another on an encrypted USB drive in your office, and a third on a secure cloud service such as ProtonDrive (caveats about cloud storage still apply). ↩︎
Otter.ai privacy policy, Section 2: “We train our proprietary artificial intelligence technology on de-identified audio recordings. We also train our technology on transcriptions to provide more accurate services, which may contain Personal Information. We obtain explicit permission (e.g. when you rate the transcript quality and check the box to give Otter.ai and its third-party service provider(s) permission to access the conversation for training and product improvement purposes) for manual review of specific audio recordings to further refine our model training data.” ↩︎
Otter.ai privacy policy, Section 4: your data may be shared with “Cloud service providers… including Amazon Web Services, based in the United States”, “Data labeling service providers who provide annotation services…”, “Advertising Partners… to show you ads that we think may interest you”, and “Mobile advertising tracking providers who help us measure our advertising effectiveness, including AppsFlyer which is based in Israel.” I guess this means that Otter.ai is not BDS-compliant? ↩︎
This short article discusses how insights from recent hearings on data sovereignty in France may apply to the Canadian context. ↩︎
This threat modeling guide for activists is a useful exercise for thinking through your risk tolerance. ↩︎
The experience of this human rights reporter is indicative. ↩︎
Surveillance capitalism: “A parasitic economic logic in which the production of goods and services is subordinated to a new global architecture of behavioral modification”. Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism. Public Affairs: 1. ↩︎
Mountz, Alison, et al. 2015. “For Slow Scholarship: A Feminist Politics of Resistance through Collective Action in the Neoliberal University.” ACME: An International E-Journal for Critical Geographies 14(4): 1235–1259. ↩︎
The Canadian Centre for Occupational Health and Safety has a good primer here. ↩︎

wiley sharp

Notes on transcription

Transcription: machine or manual?

Manual transcription and slow scholarship

Setting up transcription tools for getting in the flow