Google: Personal Health Big Language Model and Agent Research

Original summary

Large language models (llm) can retrieve, reason, and infer a wide range of information. In terms of health, most LLM work to date has focused on clinical tasks.
However, mobile and wearable devices, which are rarely integrated into clinical tasks, provide a rich, continuous and vertical source of data for personal health monitoring. This paper proposes a new model, the Personal Health Large Language Model (PH-LLM), a fine-tuned version of Gemini, for textual understanding and reasoning of digital time-series personal health data for sleep and fitness applications.
To systematically evaluate PH-LLM, we created and planned three new benchmark datasets to test
1)Personalized insights and recommendations derived from measured sleep patterns, physical activity and physiological responses,
2)Expert domain knowledge, and
3)Prediction of self-reported sleep quality outcomes.
For the insight and advice task, we created 857 case studies on sleep and fitness. These case studies were designed in collaboration with domain experts, represent real-world scenarios and emphasize the model’s ability to understand and guide.
Through a comprehensive manual and automated assessment of specific domain rubrics, we observed that there was no statistical difference between the performance of Gemini Ultra 1.0 and PH-LLM and experts in fitness. Although experts still outperform experts in sleep, fine-tuning PH-LLM has significant improvement in leveraging knowledge and personalized information in relevant fields.
To further assess expert domain knowledge, we evaluated PH-LLM’s performance in sleep medicine and fitness multiple-choice exams.
phd-llm achieved a 79% score in sleep (N=629 questions) and fitness (N=99 questions), both exceeding the average score of a sample of human experts and benchmarks for gaining continued credit in these areas. To enable PH-LLM to predict reported assessments of sleep quality,
We trained the model to predict self-reported sleep disruption and sleep impairment outcomes from textual and multimodal encoded representations of wearable sensor data.
It is proved that multimodal coding is both necessary and sufficient to match the performance of a set of discriminant models to predict these results. Although further development and evaluation are needed in the safety-critical area of personal health, these results demonstrate the extensive knowledge base and capabilities of the Gemini model and the benefits of using physiological data for personal health applications, like PH-LLM.

For more details, you can browse the link below the video
Thank you for watching this video. If you like it, please subscribe and like it. thank

Paper：https://arxiv.org/abs/2406.06474
Oil tubing: