Castor

Personal Voice Assist with Voice Cloning

Castor

Personal Voice Assist with Voice Cloning

Goal

For some forms of throat cancer, the necessary surgery will likely take away their voice, either temporarily or permanently. In addition, many patients, regardless of their condition, might need to be intubated, resulting in the ability to speak.


Figure 1 - Castor App Home View

Innovation

Tools and Apps already exist that allow patients to vocalize their needs or ask questions. However, with new machine learning models, we can now clone a human's voice and let the AI speak on their behalf, in their voice. For this project, I created an IOS iPad app specifically for patient needs, but with the ability to pre-record their voice while still functioning, such that the app can mimic that same voice while being used.


Figure 2 - Castor App Pain Indicator View

Impact

The ability to continue speaking in one's voice should provide confidence and comfort to a patient who has undergone a traumatic event. Further still, since I developed the application in-house, we can offer the patient to continue to use the application even after discharge as ownership lies entirely with the hospital, not some 3rd party SaaS vendor.


Figure 3 - Castor App Multidimensional (Arousal - Valence) Indicator View

Technology

My team and I built a mobile application using augmented reality and computer vision to automatically track the patient during the sit-to-stand and overlay this information on top of the video feed. Specifically, we used posture recognition machine learning algorithms (IOS Vision toolbox) that track the location of several key body landmarks, primarily joints. This is similar to facial recognition, where key facial features (eyes, nose, mouth) are detected and tracked. Once detected in the image, the algorithm converts them into points in 3d space, creating a skeleton model. In addition, the landmarks and their connections can be overlayed on the video feed, increasing the information available to the clinician when viewing the feed. The resulting data (both the absolute and relative landmark positions over time) can be used to automatically count the number of completed sit-to-stands. However, more importantly, it allows the exercise to be a) replayed and b) replayed from any angle deemed informative. The skeleton model can be seen as it displaces from a side view, whereas the feed itself was only ever strictly from the front. Finally, by analyzing the relative positions over time, we can compute various medically relevant metrics. For example, how much tremor was observed? Or whether the patient performed the exercise asymmetrically.


Figure 4 - Castor App Sentiment Indicator

Technologies Used