. 2008 Nov 25;105(47):18153-8.

doi: 10.1073/pnas.0800332105. Epub 2008 Nov 18.

A Poissonian explanation for heavy tails in e-mail communication

R Dean Malmgren¹, Daniel B Stouffer, Adilson E Motter, Luís A N Amaral

Affiliations

PMID: 19017788
PMCID: PMC2587567
DOI: 10.1073/pnas.0800332105

A Poissonian explanation for heavy tails in e-mail communication

R Dean Malmgren et al. Proc Natl Acad Sci U S A. 2008.

. 2008 Nov 25;105(47):18153-8.

doi: 10.1073/pnas.0800332105. Epub 2008 Nov 18.

Authors

R Dean Malmgren¹, Daniel B Stouffer, Adilson E Motter, Luís A N Amaral

Affiliation

¹ Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA.

PMID: 19017788
PMCID: PMC2587567
DOI: 10.1073/pnas.0800332105

Abstract

Patterns of deliberate human activity and behavior are of utmost importance in areas as diverse as disease spread, resource allocation, and emergency response. Because of its widespread availability and use, e-mail correspondence provides an attractive proxy for studying human activity. Recently, it was reported that the probability density for the inter-event time tau between consecutively sent e-mails decays asymptotically as tau(-alpha), with alpha approximately 1. The slower-than-exponential decay of the inter-event time distribution suggests that deliberate human activity is inherently non-Poissonian. Here, we demonstrate that the approximate power-law scaling of the inter-event time distribution is a consequence of circadian and weekly cycles of human activity. We propose a cascading nonhomogeneous Poisson process that explicitly integrates these periodic patterns in activity with an individual's tendency to continue participating in an activity. Using standard statistical techniques, we show that our model is consistent with the empirical data. Our findings may also provide insight into the origins of heavy-tailed distributions in other complex systems.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
Example of a periodic and cascading stochastic process. (A) Expected probability of starting an active interval during a particular day of the week *p_w*(t). We depict 2 weeks to emphasize that this pattern is periodic and that every week is statistically identical to every other week. We surmise that e-mail users are more likely to send e-mails on the same days of the week, a consequence of regular work schedules. (B) Expected probability of starting an active interval during a particular time of the day *p_d*(t). Again, we depict 14 days to emphasize that this pattern is periodic and that every day is statistically identical to every other day. We surmise that e-mail users are more likely to send e-mails during the same times of the day, a consequence of circadian sleep patterns. (C) The resulting activity rate ρ(t) for the nonhomogeneous Poisson process. The activity rate ρ(t) is proportional to the product of the daily and weekly patterns of activity where the proportionality constant *N_w* is the average number of active intervals per week (Eq. 1). (D) A time series of events generated by a nonhomogeneous Poisson process. Each event in this time series initiates a cascade of additional events, an active interval. (E) Schematic illustration of cascading activity. During cascades—active intervals—we expect that an individual will send *N_a* additional e-mails according to a homogeneous Poisson process with rate ρ_a. We denote the start of active intervals with a dashed line to signify that the activity is no longer governed by the nonhomogeneous Poisson process rate ρ(t). Once the active interval concludes, e-mail usage is again governed by the periodic rate ρ(t). We refer to the collection of active intervals as the active interval configuration C throughout the manuscript. (F) Observed time series. Because the data do not isolate intervals of activity, the observed time series is the superposition of both the nonhomogeneous Poisson process time series and the active interval time series.

**Fig. 2.**
Systematic deviations of the data from the truncated power-law null model due to periodic patterns of human activity. The vertical lines at τ = 10 hours is meant as a guide to the eye. (A and B) Comparison of truncated power-law model (red line) with empirical data (open squares) for Users 2650 and 467 from the dataset (23). Lines of best fit are estimated by minimizing the area test statistic (see *Null Model* in *SI Text*). (C and D) Log-residual, R = ln (p_ℳ(τ∣θ̂)/p(τ)) of the best-fit truncated power-law distribution model ℳ. The shaded region denotes inter-event times where the null model underestimates the data. If the empirical inter-event time distribution were well-described by the truncated power-law null model, the log-residuals R would be small and normally distributed, particularly in the tail of the distribution. However, the log-residuals R have large systematic fluctuations in the tail of the inter-event time distribution (τ > 0.25 hours) where the power-law scaling approximately holds. (E) Conditional probability density p(R∣τ) obtained for all 394 users under consideration. The average log-residual at each inter-event time is represented by the dashed line. Both the average log-residual and conditional probability density indicate that nearly all users under consideration systematically deviate from the truncated power-law null model, as anticipated from the arguments in *Empirical Patterns*.

**Fig. 3.**
Patterns of e-mail activity for 4 users in increasing order of e-mail usage (see *SI Appendix* for the same analysis for all 394 users). These e-mail users exemplify the e-mail usage patterns that are typical of the users in the dataset. We use simulated annealing to identify active intervals and calculate the parameters for the cascading nonhomogeneous Poisson process (see *Methods*). The red distributions and text in A and B correspond with the parameters for the primary process, a nonhomogeneous Poisson process, whereas the blue distributions and text (C) correspond with the parameters for the secondary process, a homogeneous Poisson process. (A and B) Active intervals are much more likely during weekdays rather than weekends and during the daytime rather than the nighttime. These prolonged periods of inactivity lead to the heavy tail in the inter-event time distribution. (C) Small inter-event times, in contrast, are characteristic of active intervals. One can interpret active intervals in several ways: Larger ρ_a may indicate that a user is a more proficient e-mail user; larger 〈*N_a*〉/ρ_a may suggest that an individual has a larger attention span; *N_a*/ρ_a may be the time that an individual has to check e-mail before their next commitment.

**Fig. 4.**
Comparison of the predictions of the cascading nonhomogeneous Poisson process (red line) with the empirical cumulative distribution of inter-event times P(τ) (black line) for the same users from Fig. 3 (see *SI Appendix* for the same analysis for all 394 users). We use the area test statistic A (Eq. 2) and Monte Carlo hypothesis testing to calculate the P value between the model and the data (see *Monte Carlo Hypothesis Testing* in *SI Text*). As these figures are presented, the area test statistic A is the area between the two curves. Not only do the predictions of the cascading nonhomogeneous Poisson process visually agree with the empirical data, but the P values indicate that it cannot be rejected as a model of e-mail activity at a conservative 5% significance level.

**Fig. 5.**
Model comparisons. (A) Summary of the hypothesis-testing results for the cascading nonhomogeneous Poisson process and the truncated power-law null model for the 394 users under consideration. For each user, we compute the P value between their inter-event time distribution and the predictions of each model (see *Monte Carlo Hypothesis Testing* in *SI Text*). We reject a model for a particular user if the P value is less than the 5% rejection threshold (gray shaded region). At this significance level, the cascading nonhomogeneous Poisson process can be rejected for 1 user, whereas the truncated power-law null model can be rejected for 344 users (see *Null Model* in *SI Text*). Note that if the data were actually generated by one of the models tested, we would expect to see a uniform distribution of P values (dashed line). Because this is very nearly the case for the cascading nonhomogeneous Poisson process, this provides additional evidence that our model is consistent with the data. (B) Conditional probability density p(R∣τ) obtained for all 394 users under consideration. The average log-residual at each inter-event time is represented by the dashed line. In contrast to the results in Fig. 2E, we find no systematic deviations between the model predictions and the data in the tail of the inter-event time distribution where the power-law scaling approximately holds.

See this image and copyright information in PMC

Cited by

Rich gets simpler.
Lambiotte R. Lambiotte R. Proc Natl Acad Sci U S A. 2016 Sep 6;113(36):9961-2. doi: 10.1073/pnas.1612364113. Epub 2016 Aug 23. Proc Natl Acad Sci U S A. 2016. PMID: 27555586 Free PMC article. No abstract available.
Unraveling dynamics of human physical activity patterns in chronic pain conditions.
Paraschiv-Ionescu A, Buchser E, Aminian K. Paraschiv-Ionescu A, et al. Sci Rep. 2013;3:2019. doi: 10.1038/srep02019. Sci Rep. 2013. PMID: 23779003 Free PMC article.
Universal features of correlated bursty behaviour.
Karsai M, Kaski K, Barabási AL, Kertész J. Karsai M, et al. Sci Rep. 2012;2:397. doi: 10.1038/srep00397. Epub 2012 May 4. Sci Rep. 2012. PMID: 22563526 Free PMC article.
Evidence for a bimodal distribution in human communication.
Wu Y, Zhou C, Xiao J, Kurths J, Schellnhuber HJ. Wu Y, et al. Proc Natl Acad Sci U S A. 2010 Nov 2;107(44):18803-8. doi: 10.1073/pnas.1013140107. Epub 2010 Oct 19. Proc Natl Acad Sci U S A. 2010. PMID: 20959414 Free PMC article.
Bursts and heavy tails in temporal and sequential dynamics of foraging decisions.
Jung K, Jang H, Kralik JD, Jeong J. Jung K, et al. PLoS Comput Biol. 2014 Aug 14;10(8):e1003759. doi: 10.1371/journal.pcbi.1003759. eCollection 2014 Aug. PLoS Comput Biol. 2014. PMID: 25122498 Free PMC article.

See all "Cited by" articles

References

1. Smith A. An Inquiry into the Nature and Causes of the Wealth of Nations. London: Methuen; 1786.
1. Pareto V. Manuale di Economia Politica. Milan: Societa Editrice; 1906.
1. Zipf GK. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Cambridge, MA: Addison–Wesley; 1949.
1. Stanley MHR, et al. Scaling behaviour in the growth of companies. Nature. 1996;379:804–806.
1. Huberman BA, Pirolli PLT, Pitkow JE, Lukose RM. Strong regularities in world wide web surfing. Science. 1998;280:95–97. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Poissonian explanation for heavy tails in e-mail communication

Affiliation

A Poissonian explanation for heavy tails in e-mail communication

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources