Across 211 Ring wearers, five biometric signals drift from personal baseline around a self-reported illness

Hrithik Basu Roy, Aditi Shanmugam, Kanika Gupta, Vinayak Narasimhan

Ultrahuman Healthcare Pvt. Ltd.

Summary

Resting heart rate is the most consistent signal. Night RHR rose on all four nights, from +0.15 bpm three nights out to +3.62 bpm on the sick night. The other four reached their largest deviation on the sick night, non-monotonically: sleep HRV −3.8 ms, skin temperature +0.37°C, REM −18 min, sleep −44 min.
The composite separates sick nights from ordinary nights, modestly. Against 4,079 non-illness nights from the same members, the four-signal composite reached 81% specificity and 62% sensitivity (AUC 0.76). Because illness nights are rare, single-night positive predictive value is low: it is useful as trend and context, not a standalone alarm.
When the Ring and the member both flag an episode, neither is reliably first. Across 68 episodes flagged by both within a week, the Ring led on 17 and the member on 24; the median difference is zero days.
Fever and immune-load episodes look different. Fever-tagged bodies warm two nights early (+0.52σ at T−2 to +1.89σ on the sick night); immune-load episodes stay near baseline on temperature, carried instead by the cardiovascular and sleep signals.
Menstrual-cycle phase inflates the female temperature signal. Uncorrected, female detection runs 12 points above male; after cycle correction the gap halves to 5 points (84% vs 79%), and the residual is not temperature-driven.

Background and Rationale

A wearable that records skin temperature, heart rate, HRV and sleep architecture every night creates a chance that standard care does not have: to watch the same physiological signals before, during and after an illness, measured against the person's own recent normal. Population thresholds miss people whose baseline differs from the average; a personal baseline asks a narrower question, whether tonight is different from your own recent nights.

Prior work with consumer wearables has shown that resting heart rate elevation can accompany the onset of respiratory infection, and that combining wearable signals with self-reported symptoms improves identification of likely cases [1, 2]. Those studies largely relied on single signals or population thresholds. Sleep is closely tied to immune function [4, 6], and shorter sleep is associated with greater susceptibility to the common cold [5], which is part of why a panel that includes sleep architecture, rather than heart rate alone, is worth tracking around an illness. This study describes how a panel of signals moves around a logged illness in a single member base, scores a personal-baseline composite, and, importantly, measures that composite against the same members' non-illness nights so that the sick-night numbers carry a false-positive rate.

This is an observational, descriptive study. Illness here is an in-app tag, not a clinical diagnosis, and the analysis reports associations between biometric deviation and a self-reported or system-flagged sick day. It does not establish that the Ring detects, diagnoses or predicts disease.

Methods

Cohort. The study covers 211 Ultrahuman Ring members (150 male, 61 female) who carried an illness tag with a sick date between 15 October and 30 November 2025. Each episode supplies a four-night window, the sick night (T=0) and the three nights before it (T−1, T−2, T−3), and a 28-day personal baseline ending before the window. Of the 211 episodes, 203 had at least five baseline nights and were used for composite scoring (146 male, 57 female; 8 excluded for insufficient baseline). Category and sex counts elsewhere refer to this baseline-valid subset.

How episodes were tagged. Of the 211 episodes, 186 (88%) carried a system-generated tag rather than a member entry, and they differ in how much the member did. 139 (66%) were silent immune-system checks the Ring logged with no member action and then auto-accepted. 46 (22%) were temperature-triggered fever cards the member actively confirmed by accepting a card the Ring surfaced. One further system tag arrived through another in-app surface, and the remaining 25 episodes were 24 member-entered tags and one member-confirmed system tag. This describes how members entered the cohort. It is not a measured detection sensitivity: because the cohort is defined by these tags, an illness the system missed and the member never logged cannot appear in the data, so true and false negatives are unobservable.

Signals. Eight nightly signals are tracked against each member's 28-day baseline: night RHR, skin-temperature deviation, sleep HRV (RMSSD), REM-sleep minutes, sleep duration, average sleeping heart rate, lowest nocturnal heart rate and deep-sleep minutes. Average and lowest sleeping heart rate have no 28-day baseline in this dataset and are referenced to each member's T−3 night.

The composite. A four-signal composite scores each night for deviation from the personal baseline: skin-temperature z-score above 1 (one point) or above 2 (two points), REM sleep more than 20% below baseline (one point), sleep duration more than one hour below baseline (one point) and night RHR more than 5% above baseline (one point). Temperature is the only double-weighted signal, so a single strong temperature reading can reach the two-point threshold on its own (this happens in 6 of the scored sick nights); counting each signal once lowers the two-point rate from 61.6% to 58.6%. Sleep HRV is reported throughout but is not part of this composite.

Control nights and specificity. To give the sick-night fire rates a false-positive rate, the same composite was scored on 4,079 non-illness nights drawn from the same members' baseline windows, restricted to nights at least seven days before the sick day to avoid early-illness contamination. Specificity is the share of control nights that do not fire; sensitivity is the share of sick nights that do; the area under the ROC curve (AUC) summarises separation across the score thresholds. A leave-one-out variant, which scores each control night against a baseline that excludes that night, gives a slightly higher false-positive rate (47% versus 43% for one or more signals), so the reported specificity is, if anything, conservative.

Statistics. Rates are reported with 95% bootstrap confidence intervals (CIs) resampling members; the ROC AUC is the Mann-Whitney estimate. Trajectory values are means against the personal baseline. The study is observational and exploratory; the analysis does not apply a multiple-comparison correction across the full signal panel, and the CIs should be read with that in mind.

Results

Five signals drift from baseline, and only resting heart rate rises every night. Night RHR climbed across all four nights, from +0.15 bpm at T−3 to +3.62 bpm on the sick night. The other signals reached their largest deviation on the sick night without a clean night-by-night climb: sleep HRV sat slightly above baseline three nights out (+0.5 ms) before falling to 3.8 ms below it, skin temperature reached +0.37°C, REM sleep fell 18 minutes and sleep duration 44 minutes (Figure 1). Read against a personal baseline, a sick night is visibly different from a member's recent nights across several systems at once.

Figure 1. Five signals drift from baseline across the four nights before illness.

Figure 1. Each signal plotted against the member's own 28-day baseline, on its own scale, from T−3 to the sick night. Night resting heart rate is the only signal that rises on every night; the other four reach their largest deviation on the sick night but move non-monotonically. Sleep HRV sits slightly above baseline three nights out before falling.

Against the members' own non-illness nights, the composite separates sick nights modestly. No single signal is reliable on its own: at the sick night each individual rule fires in fewer than half of episodes (temperature 44%, REM 46%, sleep duration 44%, RHR 48%, HRV 44%). The same lesson shows up outside this cohort. The Ring's most direct single-signal alert, the temperature-triggered fever card, is dismissed by members about half the time they act on it (435 of 809 answered cards across the full member base, 54%; 47% within this study's window), a reminder that one elevated reading, even surfaced as a direct prompt, is often not experienced as illness. The composite does better. At its two-signal threshold it fires on 62% of sick nights and 19% of control nights, an 81% specificity and a 62% sensitivity; lowering the threshold to one or more signals raises sensitivity to 82% but also raises the control fire rate to 43% (a 57% specificity). Across the score thresholds the AUC is 0.76 (Mann-Whitney p < 0.001). The composite fire rate climbs from the control baseline (19% at two signals) through the three nights before illness (28% to 31%) to the sick night (62%), so the days before the sick day do carry a modest signal above ordinary nights (Figure 2). Two cautions hold this in place. The separation is real but moderate, an AUC of 0.76 rather than the near-certainty an alarm implies. And because illness nights are rare, the share of composite-positive nights that are genuinely sick nights is low at the level of a single night; the value is in the trend across nights and as context for how a member is feeling, not as a standalone alert.

Figure 2. Sick-night versus control-night composite fire rate, and the ROC.

Figure 2. Left: the composite fire rate rises from non-illness control nights (19% at two signals) through the pre-illness window to the sick night (62%); the dashed line is the one-or-more-signal rate. Right: the ROC of the composite score against control nights, AUC 0.76, with the two-signal operating point marked (62% sensitivity, 81% specificity). Sick nights n = 203; control nights n = 4,079 from the same members.

Fever and immune-load episodes sit at different points of the same window. The 55 fever-tagged episodes show skin temperature already +0.52σ above baseline two nights before the sick day, rising to +1.89σ on the sick night. The 119 immune-load episodes stay near baseline on temperature throughout (+0.17σ on the sick night) and are carried instead by RHR elevation (+3.8 bpm) and a REM-sleep drop (26 minutes). Temperature flags the fever episodes early; the immune-load episodes are visible only through the cardiovascular and sleep signals (Figure 3).

Figure 3. Fever versus immune-load temperature trajectory.

Figure 3. Skin-temperature z-score from T−3 to the sick night, split by illness tag. Fever-tagged bodies warm from two nights out; immune-load bodies stay near baseline on temperature.

When the Ring and the member both flag an episode, neither is reliably first. In 68 episodes the Ring's background algorithm and the member's own log fell within seven days of each other. The Ring flagged first in 17 of these (mean lead 2.8 days, up to 6 days), the same day in 27, and the member flagged first in 24 (up to 6 days ahead). The net median difference is zero days (Figure 4). The 17 Ring-first cases are genuine examples of the system surfacing an episode before the member logged anything, and they should be read alongside the 24 cases where the member noticed first; this is co-detection within a window, not a measured lead.

Figure 4. Ring-first versus member-first onset across cross-flagged episodes.

Figure 4. Day-gap between the Ring's flag and the member's own log across the 68 episodes flagged by both within a week. Negative is member-first (24 episodes), zero is same day (27), positive is Ring-first (17). The median is zero.

HRV falls as resting heart rate rises. At the nightly level, sleep HRV and night RHR move in opposite directions across the window, with both moving most on the sick night: HRV from 46 ms at T−3 to 42 ms, RHR from 58 to 62 bpm (Figure 5). This opposing movement is consistent with a shift toward sympathetic autonomic activity, a hypothesis these nightly aggregates support but do not prove.

Figure 5. Nightly sleep HRV and resting heart rate across four nights.

Figure 5. Sleep HRV (ms) and night resting heart rate (bpm) across the four nights. Both reach their largest deviation on the sick night. Sleep HRV n = 209; night RHR n = 201.

The floor of overnight recovery rises too. Average sleeping heart rate rose toward the sick night (+3.3 bpm above the T−3 night), and so did the lowest nocturnal heart rate (+3.5 bpm), the deepest point of overnight cardiovascular recovery. Deep-sleep minutes fell across the same window (Figure 6). The lowest nocturnal heart rate is the single point at which the cardiovascular system most fully settles overnight; that it is elevated on the sick night indicates the night's lowest heart rate did not return to its usual floor.

Figure 6. Sleep heart-rate architecture across four nights.

Figure 6. Average sleeping heart rate, lowest nocturnal heart rate and deep-sleep minutes across the four nights, each referenced to the member's T−3 night. The average and the floor of sleeping heart rate both rise toward the sick night while deep sleep erodes. n = 169 to 181 members with complete sleep staging.

Menstrual-cycle phase inflates the female temperature signal, and correcting for it halves the female detection gap. Progesterone in the luteal phase raises basal body temperature [3], and that rise carries into the Ring's nightly temperature signal. Of 57 female episodes, 36 had cycle data; among those, 21 fell in the luteal phase. Uncorrected, female sick-night temperature deviation averaged +0.45°C against +0.34°C for males, and uncorrected female detection ran 12 points above male (91% versus 79%). Applying phase-specific corrections brings female sick-night temperature to +0.23°C, below the male figure, and the female detection advantage falls to 5 points (84% versus 79%), now carried by the cardiovascular and sleep signals rather than by temperature (Figure 7). A temperature signal read without cycle awareness will sit high in the luteal phase and low in the follicular phase for female members.

Figure 7. Menstrual-cycle temperature correction and its effect on detection.

Figure 7. Left: raw versus cycle-corrected sick-night temperature deviation by phase; the late-luteal correction removes up to 0.40°C. Right: female detection falls from 91% uncorrected to 84% corrected, against 79% for males. n = 57 female episodes (36 with cycle data), 146 male.

Discussion, Limitations and Future Directions

A personal-baseline panel of nightly signals moves in a recognisable way around a logged illness. Night resting heart rate is the steadiest of the signals, rising on every night of the window; temperature, HRV, sleep architecture and the floor of overnight heart rate all reach their largest deviation on the sick night. The clearest single result is the comparison the original analysis lacked: scored against the same members' non-illness nights, the four-signal composite separates a sick night from an ordinary night with an AUC of 0.76. That is a real effect and a moderate one, and naming it plainly is more useful than a detection percentage with no false-positive rate behind it.

Several limitations bound these findings, and they matter for how the numbers should be read.

First, there is no clinical ground truth. Illness is an in-app tag, with no diagnosis, test result, symptom severity or device-independent onset date. The shared-source concern is sharpest for the 139 episodes the Ring flagged silently, where the signal being judged and the label come from the same system. The 46 confirmed through a fever card were surfaced by temperature but endorsed by the member, and the 25 member-entered or member-confirmed tags are largely independent of the biometric signal. Because the cohort is defined by these tags, it cannot speak to how many true illnesses the system misses.

Second, the separation is modest and the per-night positive predictive value is low. An AUC of 0.76 means a randomly chosen sick night outscores a randomly chosen control night about three times in four. Because illness nights are rare relative to ordinary nights, a composite-positive single night is far more often an ordinary night than a sick one; the composite is informative as a trend and as context, not as a standalone alarm.

Third, the control nights are the members' own baseline-window nights rather than an independently sampled period. A leave-one-out check, which removes each scored night from its own baseline, raises the false-positive rate slightly, so the reported specificity is conservative rather than optimistic; an out-of-window control month would strengthen the estimate further.

Fourth, the granular within-night HRV is not reported here. The five-minute HRV records in this dataset mixed readings of different quality tiers and scales, and the within-night suppression that appeared in a pooled view did not survive a within-person paired analysis (nine paired members, p = 0.13; the effect fell to about 5% once low-quality readings were removed). We therefore report only the nightly HRV aggregates, which reconcile with the validated nightly source data, and hold the intra-night timing question until the granular data can be re-pulled and verified.

Finally, the window is short (about six weeks of sick dates) and the design is observational and exploratory. Mechanistic language, such as a shift toward sympathetic autonomic activity, is offered as a hypothesis consistent with the HRV and heart-rate pattern, not as something this dataset measures. The analysis reports point estimates with bootstrap confidence intervals and does not correct for the full set of signal-by-night comparisons.

The directions that would most strengthen this work follow from those limits: an independently ascertained illness set with clinical confirmation to anchor ground truth; an out-of-window control period and prospective follow-up to turn the specificity estimate into a deployed performance figure; a clean re-pull of the granular HRV stream to settle the within-night timing question; and cycle-aware temperature on every female-facing surface, since the uncorrected signal sits systematically high in the luteal phase.

What this study establishes is narrower than a detection claim and, we think, more useful. The body's nightly signals do move around an illness, against a member's own baseline, in a way a continuous wearable can see; and the size of that movement, measured honestly against ordinary nights, is real and moderate.

Mishra T, Wang M, Metwally AA, et al. Pre-symptomatic detection of COVID-19 from smartwatch data. Nature Biomedical Engineering 2020;4(12):1208–1220. DOI: 10.1038/s41551-020-00640-6. PMID: 33208926.
Quer G, Radin JM, Gadaleta M, et al. Wearable sensor data and self-reported symptoms for COVID-19 detection. Nature Medicine 2021;27(1):73–77. DOI: 10.1038/s41591-020-1123-x. PMID: 33122860.
Su HW, Yi YC, Wei TY, Chang TC, Cheng CM. Detection of ovulation, a review of currently available methods. Bioengineering & Translational Medicine 2017;2(3):238–246. DOI: 10.1002/btm2.10058. PMID: 29313033.
Besedovsky L, Lange T, Born J. Sleep and immune function. Pflügers Archiv – European Journal of Physiology 2012;463(1):121–137. DOI: 10.1007/s00424-011-1044-0. PMID: 22071480.
Prather AA, Janicki-Deverts D, Hall MH, Cohen S. Behaviorally assessed sleep and susceptibility to the common cold. Sleep 2015;38(9):1353–1359. DOI: 10.5665/sleep.4968. PMID: 26118561.
Opp MR, Krueger JM. Sleep and immunity: a growing field with clinical impact. Brain, Behavior, and Immunity 2015;47:1–3. DOI: 10.1016/j.bbi.2015.03.011. PMID: 25849976.