Mental health problems impact the quality of life of millions of people around the world. However, diagnosis
of mental health disorders is a challenging problem that often relies on self-reporting by patients about their
behavioral patterns and social interactions. Therefore, there is a need for new strategies for diagnosis and
daily monitoring of mental health conditions. The recent introduction of body-area networks consisting of a
plethora of accurate sensors embedded in smartwatches and smartphones and edge-compatible deep neural
networks (DNNs) points toward a possible solution. Such wearable medical sensors (WMSs) enable continuous monitoring of physiological signals in a passive and non-invasive manner. However, disease diagnosis
based on WMSs and DNNs, and their deployment on edge devices, such as smartphones, remains a challenging problem. These challenges stem from the difficulty of feature engineering and knowledge distillation
from the raw sensor data, as well as the computational and memory constraints of battery-operated edge
devices. To this end, we propose a framework called MHDeep that utilizes commercially available WMSs and
efficient DNN models to diagnose three important mental health disorders: schizoaffective, major depressive,
and bipolar. MHDeep uses eight different categories of data obtained from sensors integrated in a smartwatch
and smartphone. These categories include various physiological signals and additional information on motion
patterns and environmental variables related to the wearer. MHDeep eliminates the need for manual feature
engineering by directly operating on the data streams obtained from participants. Because the amount of data
is limited, MHDeep uses a synthetic data generation module to augment real data with synthetic data drawn
from the same probability distribution. We use the synthetic dataset to pre-train the weights of the DNN
models, thus imposing a prior on the weights. We use a grow-and-prune DNN synthesis approach to learn
both architecture and weights during the training process. We use three different data partitions to evaluate
the MHDeep models trained with data collected from 74 individuals. We conduct two types of evaluations: at
the data instance level and at the patient level. MHDeep achieves an average test accuracy, across the three
data partitions, of 90.4%, 87.3%, and 82.4%, respectively, for classifications between healthy and schizoaffective disorder instances, healthy and major depressive disorder instances, and healthy and bipolar disorder
instances. At the patient level, MHDeep DNN models achieve an accuracy of 100%, 100%, and 90.0% for the
three mental health disorders, respectively, based on inference that uses 40, 16, and 22 minutes of sensor data
collection from each patient.