Overview

Intercoder reliability is a measure of agreement or consistency among two or more coders who independently code a transcript. It measures if two or more coders apply the same codes to the same text in the same way. Intercoder reliability may be useful in situations such as:

Publication or reporting requirements where reliability metrics are required
Training new researchers to ensure alignment with a shared codebook
Deductive qualitative analysis where codes are predefined and consistency is important

To calculate an intercoder reliability score in Delve you will need two or more researchers to code the same transcript. Follow the steps below to calculate intercoder reliability.

Calculate Intercoder Reliability

Step 1: Create a project with a codebook

Intercoder reliability measures how your team applies the same codebook. Before you begin coding, you will need a codebook with well-defined code descriptions.

Step 2: Invite Your Team to the Same Project

After creating the project with a codebook, invite your team using the share button in the upper right-hand corner.

Step 3: Your team should code the transcript using the Code By Me View

Your research team should code the same transcript without looking at each other’s work. They can do this using the Coded By Me feature, which hides all other team members’ coding.

Note: For intercoder reliability, teams should agree whether coders are allowed to add new codes. This method is most appropriate for deductive coding where the codebook is predefined.

Step 4: Calculate your intercoder reliability score

To calculate intercoder reliability, open the Transcript dropdown and click Coding Comparison.

<a href="https://img.notionusercontent.com/s3/prod-files-secure%2F9d4e91a5-1098-43d5-be3f-a8c5f3038fcb%2F72757781-3865-4058-9aeb-e0ab9925b558%2Fimage.png/size/w=2000?exp=1779390573&sig=xiC-fukvySHTZrKDHH45dyCCuEVH9QOuDzSFaBJQ6SM&imgBuildSrc=presignImageUrl&id=35f1099f-daa8-803e-a36f-e17a0c37cba4&table=block&userId=1e4d872b-594c-817a-a3c2-0002886c335d&mtd=so" rel="nofollow noopener noreferrer" target="_blank">https://img.notionusercontent.com/s3/prod-files-secure%2F9d4e91a5-1098-43d5-be3f-a8c5f3038fcb%2F72757781-3865-4058-9aeb-e0ab9925b558%2Fimage.png/size/w=2000?exp=1779390573&sig=xiC-fukvySHTZrKDHH45dyCCuEVH9QOuDzSFaBJQ6SM&imgBuildSrc=presignImageUrl&id=35f1099f-daa8-803e-a36f-e17a0c37cba4&table=block&userId=1e4d872b-594c-817a-a3c2-0002886c335d&mtd=so</a>

Next click Intercoder Reliability.

<a href="https://img.notionusercontent.com/s3/prod-files-secure%2F9d4e91a5-1098-43d5-be3f-a8c5f3038fcb%2Ff98dc259-08ee-4b63-b5cd-06e72b0cb106%2Fimage.png/size/w=2000?exp=1779390600&sig=gu9D9ckt1W1gZpKa1cTqi1a-EOMUP50ATj3JUu5yjTc&imgBuildSrc=presignImageUrl&id=35f1099f-daa8-8014-aa6e-c0ea2396f48a&table=block&userId=1e4d872b-594c-817a-a3c2-0002886c335d&mtd=so" rel="nofollow noopener noreferrer" target="_blank">https://img.notionusercontent.com/s3/prod-files-secure%2F9d4e91a5-1098-43d5-be3f-a8c5f3038fcb%2Ff98dc259-08ee-4b63-b5cd-06e72b0cb106%2Fimage.png/size/w=2000?exp=1779390600&sig=gu9D9ckt1W1gZpKa1cTqi1a-EOMUP50ATj3JUu5yjTc&imgBuildSrc=presignImageUrl&id=35f1099f-daa8-8014-aa6e-c0ea2396f48a&table=block&userId=1e4d872b-594c-817a-a3c2-0002886c335d&mtd=so</a>

This will generate your intercoder reliability score for that transcript.

FAQs

Q: Can Delve provide an Intercoder Reliability Score for an entire project?

A: No, at this time Delve’s Intercoder Reliability Score is for use on individual transcripts, not across entire projects.

Q: How is the Intercoder Reliability Score Calculated?

A: The intercoder reliability score is calculated using Krippendorff’s Alpha, a standard statistical measure for intercoder reliability.

Krippendorff’s Alpha ranges from -1 to 1:

1 = perfect agreement
0 = agreement no better than chance
< 0 = worse than chance

Krippendorff’s Alpha uses weighting to determine how agreement is calculated when coders do not apply identical codes.

Delve uses Jaccard-based weighting to account for partial overlap between code sets.

This means agreement is not all-or-nothing when multiple codes are applied. Instead, agreement is proportional to how much overlap exists between coders’ code selections.

The Jaccard index is calculated as:

|A ∩ B| / |A ∪ B|

Where:

A = codes applied by coder 1
B = codes applied by coder 2
A ∩ B = shared codes
A ∪ B = all unique codes across both coders

A score of 1 indicates identical coding sets, while lower values reflect partial overlap.

Q: What are the benefits of Krippendorff's Alpha?

A: Krippendorff’s Alpha has several advantages over simpler measures like percent agreement:

Any number of coders: Supports multiple coders in the same calculation
Handles missing data: Coders do not need to code exactly the same segments
Adjusts for chance: Accounts for agreement that may occur randomly

Q: What is a good intercoder reliability score?

A: A good intercoder reliability score depends on the context of your research. However, a score of 0.8 or higher is commonly considered strong agreement.

Q: Do my team's snippets need to perfectly overlap to be included in the score?

A: No. Coders do not need to perfectly align on snippet boundaries. Delve accounts for overlapping coded segments when calculating the score.

Q: What happens if only one person codes a piece of text?

A: If only one person codes a piece of text, it is excluded from the Krippendorff’s Alpha calculation. Only segments coded by at least two coders are included.

Q: Are there scenarios where the intercoder reliability score not be calculated?

A: There are a few cases where the score cannot be calculated:

Only one person (or nobody) has coded the transcript
Two or more coders have not coded overlapping text
Coders have only used a single code in total

Q: How does it impact the intercoder reliability score if a researcher codes a segment with more than one code?

A: If your team is conducting a strict intercoder reliability test, it is recommended to use a codebook where codes are conceptually distinct so that each segment receives a single primary code.

However, Delve supports multi-code segments, and these are included in the calculation.

When multiple codes are applied, agreement is calculated based on overlap between code sets rather than requiring an exact match.

For example:

Coder A: {Fear}
Coder B: {Fear, Anxiety}

The shared code Fear contributes to agreement, while the additional code Anxiety reduces similarity.

Using Jaccard weighting:

Intersection = 1
Union = 2
Agreement = 1/2 = 0.5

This allows partial agreement to be reflected in the final score, rather than treating the coding as completely different.

Q: Does my team have to use Intercoder Reliability?

A: No, not all qualitative research requires intercoder reliability. Some researchers argue that strict reliability measures can reduce depth and interpretive richness in qualitative analysis.

Instead of focusing solely on a quantitative score, teams may also use Delve’s Coding Comparison feature to discuss and compare interpretations directly.