Project BlueSky - Dev Patel

Overview

Project BlueSky was an early attempt to build more emotionally aware mental health technology. The motivation was that most mental health chatbots only operate on text, but real emotional state is expressed through tone, facial movement, speech rhythm, and the content of what someone says. I wanted to explore whether a system could combine these signals into a richer picture of patient state.

The system used a multimodal deep learning pipeline for analyzing facial expressions, voice patterns, and spoken language. The audio pipeline extracted MFCCs and spectrogram features, then used recurrent models to capture time-series emotion patterns. A DeepSpeech2-inspired ASR model converted speech into text while preserving the broader audio context needed for downstream analysis.

For facial sentiment, the pipeline used OpenCV-based face localization, 68-point facial landmark detection, and CNN classifiers for pixel-level emotion recognition. The idea was to combine geometric facial movement with image-level classifiers to create a more robust emotional signal than either method alone.

The NLP layer processed transcribed speech using part-of-speech tagging, named entity recognition, key-phrase extraction, and BiLSTM-CRF sequence modeling. This helped connect emotional shifts to concrete topics or phrases in the user’s narrative, making the system more interpretable than a black-box emotion classifier.

The goal was not to replace clinicians. It was to build better sensing infrastructure for mental health contexts: tools that could help surface emotional patterns, track mood over time, and give professionals a more data-rich view of a patient’s lived experience.

Technical Highlights

Multimodal emotion recognition across face, voice, and text
MFCC and spectrogram-based audio preprocessing
DeepSpeech2-inspired ASR pipeline
OpenCV facial detection and 68-point landmark modeling
CNN-based facial emotion classification
BiLSTM-CRF NLP pipeline for phrase and entity extraction
PyTorch, OpenCV, NLTK, and pretrained model integration