A person wearing a virtual reality headset and light-colored hoodie interacting with glowing digital graphics. Both hands are raised, with fingers pointing toward floating blue and red holographic interface elements on a dark background.

Project 3: Co-Designing Accessible Captioning for Immersive 360° VR Video

This ongoing research lab study examines the captioning needs of Deaf and Hard-of-Hearing (DHH) adults through literature reviews, multi-phase co-design, prototype development, VR testing, and interviews. This study is in progress and on track for publication.

Conducted via DePaul University Accessibility Research Lab

Project Timeline:

Project Type: Immersive Research & Co-Designing

Primary Tools: Captionfy, Bezi, Prolific, Atlas.ti, TurboScribe, Adobe Premiere, and Overleaf

My Role: Research Lab Assistant

Duration: Over 1 year

Project Overview

Background: What is This Research Lab Project?

Poster titled “Accessible VR Captioning for DHH Users” with graphics of people wearing VR headsets and glasses. The poster describes a multi-phase study from DePaul’s Accessibility Research Lab. Phase 1: Explored captioning design values with DHH participants using simulated 360° VR videos. Phase 2: Refined methodology and VR testing strategies. A central closed caption (CC) icon is shown between the phases. The color scheme uses teal, mustard yellow, and orange tones with illustrated characters.

As part of a larger, multi-phase study housed in DePaul’s Accessibility Research Lab, our work explores how captions can be designed for virtual reality (VR) to help users stay comfortable and feel fully immersed in virtual environments.

In Phase 1, our team explored captioning design values with Deaf and Hard-of-Hearing (DHH) participants using simulated 360° videos in virtual reality (VR).

In Phase 2, we extended that work by refining both the methodology and intervention strategies, expanding co-design tasks, and preparing for VR testing of caption prototypes in a future study.

Problem:

Unlike standard 2D videos, 360° videos in VR allow users to control their gaze, introducing unique challenges for DHH users such as:

Four illustrated icons with text underneath: Yellow-and-black striped roadblock with a "CC" symbol in the center. Text below: "Difficulty avoiding obstructive caption placement." Brain split in two colors (teal and red) with three question marks above it. Text below: "Attention splitting captions in 360° videos." Orange puzzle pieces with a magnifying glass overlaid. Text below: "Lack of research on DHH caption preference for VR." Green outline of a head wearing a VR headset with gear cogs. Text below: "Exclusion from fully engaging with VR content."

Research Questions:

To address these gaps the following research questions were developed for Phase 1:

How do participants envision captioning within-view speakers?

How do participants envision captioning off-view speakers?

Illustration of a person shrugging with a pink question mark speech bubble on the left and a green question mark speech bubble on the right. Text below:

What factors do participants discuss as important for their preference?

Goals:

In Phase 1, the study aimed to explore the preferences of DHH participants for captioning in immersive 360° videos by:

Image titled "Phase 1" with two illustrated icons and text below: #1. Illustration of a turquoise laptop. Text below: "Co-design caption solutions for within & out-of-view speakers." #2. Illustration of a warning triangle with an exclamation mark, a "CC" closed captions symbol, and a magnifying glass with check marks and an X. Text below: "Identify captioning barriers related to visibility, styling, and interactions."

In Phase 2, the goals shifted toward:

Image titled "Phase 2" with three illustrated icons and text below: #1. Illustration of a red gear, lightbulb, and user figure connected with arrows. Text below: "Refine Study Protocols." #2. Illustration of three people collaborating over a laptop. Text below: "Expand co-design tasks to explore caption behavior and dynamic transitions." #3. Illustration of a person wearing a VR headset surrounded by circular arrows. Text below: "Finalize high-fidelity prototypes for in VR testing."

My Contribution:

I joined the project during the interview stage of Phase 1, and then took on a larger role in Phase 2.

Infographic titled “Planning” with four vertically stacked teal boxes under a lightbulb icon. Each box contains white text: #1. "Recruited DHH participants using personal connections, Prolific, and Reddit" #2. "Synthesized findings from Phase 1 into design goals and future testing criteria" #3. "Updated IRB form to guide ethical immersive research" #4. "Conducted an updated literature review to support the development of Phase 2 planning"

Image titled "Testing" with an orange icon of a person holding a magnifying glass and gear above their head. Below the title are two orange boxes with white text. The first box reads: "Facilitated and analyzed interviews and participatory co-design sessions in Phase 1." The second box reads: "Defined caption styles, mapped them to 360° scenes, and supported Phase 2 test materials, including rating scales and interview prompts."

Image titled "Reporting" with a red icon of a document above the heading. Below is a large red box with white text that reads: "Co-author of the research paper and visual presentation."

Understanding the Problem

Tip: Click the images in the slider to enlarge them.

Methods- Phase 1 Interviews & Participatory Co-Design:

A three-column, three-row comparison chart showing captioning perspectives in a VR 360° medical video setting. #1. Top row (Video Screenshot): Three screenshots of a VR 360° video inside a hospital room with medical staff. The first shows a group of staff within view, the second shows one person within view, and the third shows everyone off view. #2. Middle row (Birds-Eye View): Simplified diagrams with gray human figures and a purple block, illustrating relative positions of people and captions from above. #3. Bottom row (View in VR): Simplified VR views with gray avatars showing the perspective of a user: multiple staff within view, one staff member within view, and no staff in view.

In Phase 1, we conducted remote co-design sessions with DHH participants to explore caption preferences for 360° VR videos. Using scene mockups and an educational 360° video, participants created and labeled their ideal caption layouts for three scenarios: group within view, one person within view, and all speakers off view. Each task was followed by interview questions on style satisfaction, placement, and overall feedback. These concepts informed early prototypes, emphasizing clarity, reduced visual load, and dynamic behavior.

Insights led to the development of early prototypes featuring directional labels for off-view dialogue, speaker-attached captions, and adaptive positioning to minimize occlusion and divided attention. These designs laid the groundwork for Phase 2.

Lastly, AI tools were utilized to transcribe the interview transcripts, then each researcher performed an individual thematic analysis using open coding. Then we came together as a group to compare insights, refine our codes, and uncover key themes across all participant interviews.

Findings- Phase 1 Interviews & Participatory Co-Design:

These sessions revealed patterns in caption preferences, accessibility values, and key design needs for immersive environments.

Theme 1: Consistent Speaker Identification- All participants prioritized knowing who is speaking, es

Theme 2: Always -Visible & Adaptive Captions-- All participants wanted captions to remain on screen

Theme 3: Reducing Divided Attention- Many participants felt overloaded when captions were split into

Theme 1: Consistent Speaker Identification- All participants prioritized knowing who is speaking, es

1/3

Methods- Literature Review:

I reviewed research on caption designs, clarity, and tracking for DHH users.

The insights from this review were used to refine our prototype conditions and test plan for Phase 2, focusing on comparing captioning styles, selecting scenes for testing, and developing questions to assess clarity and immersion.

By connecting research findings to practical test steps, this review reassured that Phase 2 is grounded in real user needs and tests what matters most in a true VR setting.

Findings- Literature Review:

Our review shaped Phase 2's design and confirmed:

Theme 1: Supporting natural viewing and speaker tracking- Static captions in VR force users to miss

Theme 2: Avoiding visual blockage and occlusion- Poorly placed captions can block faces, text, or gr

Theme 4: Making content easier to process- Overly technical captions make it harder for DHH viewers

Theme 1: Supporting natural viewing and speaker tracking- Static captions in VR force users to miss

1/4

Methods- Study Design & Preparation for Phase 2:

Basline Condition Caption- Street decorated with bunting and Union Jack flags during a celebration,

Baseline + Verbal Indicator Caption- A festive street scene with flags and people, with two lines of

In and Out view Bubble Caption- A festive street with people and decorations; one speech bubble is a

Basline Condition Caption- Street decorated with bunting and Union Jack flags during a celebration,

1/6

Caption: The screenshots above is from 1 of the 5 videos selected for Phase 2’s prototype.

For Phase 2, we built on the insights from Phase 1 by designing a structured VR headset study to test six captioning styles across multiple short clips from several videos, exploring genres such as educational, animated, and live-action. Each style was defined and matched to specific clips to control for factors like speaker visibility, position, movement, and scene complexity.

A table with two columns labeled Captioning Condition and Description. Rows list six captioning styles: #1. Baseline – head-locked static captions that move with the user’s head. #2. Baseline + Verbal Indicator – adds text cues like “[Person 1]” or “[behind]” to captions. #3. Speaker-Attached Traditional – captions appear near the speaker in standard text. #4. Speaker-Attached Bubble – captions appear near the speaker inside a bubble. #5. Off-View Traditional – captions appear in the direction of an off-view speaker in standard text. #6. Off-View Bubble – captions appear as directional bubbles pointing toward off-screen speakers.

A Graeco-Latin square rotation plan will ensure each participant views different caption types and genres without repetition. After each clip, participants will rate the captions on readability, visual blocking, speaker clarity, and comprehension using a scale from ‘strongly agree’ to ‘strongly disagree.’ They will also answer interview questions about what works well and what could be improved.

Additionally, building on Phase 1's findings, Phase 2’s research questions will investigate caption styles and placement, both within and out of view.

Findings- Study Design & Preparation for Phase 2:

While no new findings were collected yet (since VR headset testing is upcoming), our preparation produced:

Three stacked sections with icons and text. #1. Green section with a circuit icon: “Finalized prototype condition matrix mapping 6 caption styles to selected VR video clips.” #2. Blue section with a VR headset icon: “Created and curated 34 short 360° video clips (45 seconds to 1 minute each) across 8 immersive scenes spanning 3 different genres.” #3. Beige section with a gear and checklist icon: “Completed set of subjective rating statements and follow-up interview questions to measure caption clarity, immersion, and usability.”

We also enhanced our research questions to guide the next round, including:

An icon of a person standing next to a flag with a “CC” symbol labeled:

How do different placement strategies for within-view captions align with user values and goals?

An icon of a person wearing a VR headset with floating panels and a speech bubble labeled:

How do visual styles like bubbles or traditional captions affect clarity and immersion for off-view speakers?

Impact & Next Steps

Tip: Click the images in the slider to enlarge them.

Actionable Recommendations:

Drawing from our Phase 1 findings, we outlined 7 user needs and design goals to address user challenges and guide future caption development in VR.

A red double circle labeled: "User Needs 1: Visual indication of speakers and their location." To th

A green double circle labeled: "User Needs 2: Distinction between within-view and off-view speakers.

A circular brown-outlined graphic labeled “User Needs 6: Minimizing eye movements” appears on the le

A red double circle labeled: "User Needs 1: Visual indication of speakers and their location." To th

1/6

Ongoing Opportunites:

While this project has already uncovered valuable insights and promising design directions, it is still ongoing. It continues to reveal areas for improvement and future directions that we are addressing.

Circular diagram divided into two halves labeled Areas of Improvement (left) and Future Directions (right). Areas of Improvement (left side): Time management – The process of synthesizing survey data required more time than anticipated. Recruitment challenges – Difficulties due to fraudulent responses and limited diversity, with overrepresentation of white female participants. Scheduling constraints – Interview phase delayed by extended IRB approval timelines, emphasizing need for planning procedural delays. Future Directions (right side): Present survey findings – Preparing a clear and impactful presentation of key findings. Conduct interview phase – Post-IRB approval, interviews to launch in early September with qualitative synthesis to follow. Conference preparation – Drafting and preparing research paper and presentation for CHI or ASSETS submission, focusing on diverse perspectives.

Back

Next Project