Emotional distress detection has become a hot topic of research in recent years due to concerns related to mental health and complex nature distress identification. One of the challenging tasks is to use non-invasive technology to understand and detect emotional distress in humans. Personalized affective cues provide a non-invasive approach considering visual, vocal, and verbal cues to recognize the affective state. In this paper, we are proposing a multimodal hierarchical weighted framework to recognize emotional distress. We are utilizing negative emotions to detect the unapparent behavior of the person. To capture facial cues, we have employed hybrid models consisting of a transfer learned residual network and CNN models. Extracted facial cue features are processed and fused at decision using a weighted approach. For audio cues, we employed two different models exploiting the LSTM and CNN capabilities fusing the results at the decision level. For textual cues, we used a BERT transformer to learn extracted features. We have proposed a novel decision level adaptive hierarchical weighted algorithm to fuse the results of the different modalities. The proposed algorithm has been used to detect the emotional distress of a person. Hence, we have proposed a novel algorithm for the detection of emotional distress based on visual, verbal, and vocal cues. Experiments on multiple datasets like FER2013, JAFFE, CK+, RAVDESS, TESS, ISEAR, Emotion Stimulus dataset, and Daily-Dialog dataset demonstrates the effectiveness and usability of the proposed architecture. Experiments on the enterface'05 dataset for distress detection has demonstrated significant results.