Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov 25;105(47):18153-8.
doi: 10.1073/pnas.0800332105. Epub 2008 Nov 18.

A Poissonian explanation for heavy tails in e-mail communication

Affiliations

A Poissonian explanation for heavy tails in e-mail communication

R Dean Malmgren et al. Proc Natl Acad Sci U S A. .

Abstract

Patterns of deliberate human activity and behavior are of utmost importance in areas as diverse as disease spread, resource allocation, and emergency response. Because of its widespread availability and use, e-mail correspondence provides an attractive proxy for studying human activity. Recently, it was reported that the probability density for the inter-event time tau between consecutively sent e-mails decays asymptotically as tau(-alpha), with alpha approximately 1. The slower-than-exponential decay of the inter-event time distribution suggests that deliberate human activity is inherently non-Poissonian. Here, we demonstrate that the approximate power-law scaling of the inter-event time distribution is a consequence of circadian and weekly cycles of human activity. We propose a cascading nonhomogeneous Poisson process that explicitly integrates these periodic patterns in activity with an individual's tendency to continue participating in an activity. Using standard statistical techniques, we show that our model is consistent with the empirical data. Our findings may also provide insight into the origins of heavy-tailed distributions in other complex systems.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Example of a periodic and cascading stochastic process. (A) Expected probability of starting an active interval during a particular day of the week pw(t). We depict 2 weeks to emphasize that this pattern is periodic and that every week is statistically identical to every other week. We surmise that e-mail users are more likely to send e-mails on the same days of the week, a consequence of regular work schedules. (B) Expected probability of starting an active interval during a particular time of the day pd(t). Again, we depict 14 days to emphasize that this pattern is periodic and that every day is statistically identical to every other day. We surmise that e-mail users are more likely to send e-mails during the same times of the day, a consequence of circadian sleep patterns. (C) The resulting activity rate ρ(t) for the nonhomogeneous Poisson process. The activity rate ρ(t) is proportional to the product of the daily and weekly patterns of activity where the proportionality constant Nw is the average number of active intervals per week (Eq. 1). (D) A time series of events generated by a nonhomogeneous Poisson process. Each event in this time series initiates a cascade of additional events, an active interval. (E) Schematic illustration of cascading activity. During cascades—active intervals—we expect that an individual will send Na additional e-mails according to a homogeneous Poisson process with rate ρa. We denote the start of active intervals with a dashed line to signify that the activity is no longer governed by the nonhomogeneous Poisson process rate ρ(t). Once the active interval concludes, e-mail usage is again governed by the periodic rate ρ(t). We refer to the collection of active intervals as the active interval configuration C throughout the manuscript. (F) Observed time series. Because the data do not isolate intervals of activity, the observed time series is the superposition of both the nonhomogeneous Poisson process time series and the active interval time series.
Fig. 2.
Fig. 2.
Systematic deviations of the data from the truncated power-law null model due to periodic patterns of human activity. The vertical lines at τ = 10 hours is meant as a guide to the eye. (A and B) Comparison of truncated power-law model (red line) with empirical data (open squares) for Users 2650 and 467 from the dataset (23). Lines of best fit are estimated by minimizing the area test statistic (see Null Model in SI Text). (C and D) Log-residual, R = ln (p(τ∣θ̂)/p(τ)) of the best-fit truncated power-law distribution model ℳ. The shaded region denotes inter-event times where the null model underestimates the data. If the empirical inter-event time distribution were well-described by the truncated power-law null model, the log-residuals R would be small and normally distributed, particularly in the tail of the distribution. However, the log-residuals R have large systematic fluctuations in the tail of the inter-event time distribution (τ > 0.25 hours) where the power-law scaling approximately holds. (E) Conditional probability density p(R∣τ) obtained for all 394 users under consideration. The average log-residual at each inter-event time is represented by the dashed line. Both the average log-residual and conditional probability density indicate that nearly all users under consideration systematically deviate from the truncated power-law null model, as anticipated from the arguments in Empirical Patterns.
Fig. 3.
Fig. 3.
Patterns of e-mail activity for 4 users in increasing order of e-mail usage (see SI Appendix for the same analysis for all 394 users). These e-mail users exemplify the e-mail usage patterns that are typical of the users in the dataset. We use simulated annealing to identify active intervals and calculate the parameters for the cascading nonhomogeneous Poisson process (see Methods). The red distributions and text in A and B correspond with the parameters for the primary process, a nonhomogeneous Poisson process, whereas the blue distributions and text (C) correspond with the parameters for the secondary process, a homogeneous Poisson process. (A and B) Active intervals are much more likely during weekdays rather than weekends and during the daytime rather than the nighttime. These prolonged periods of inactivity lead to the heavy tail in the inter-event time distribution. (C) Small inter-event times, in contrast, are characteristic of active intervals. One can interpret active intervals in several ways: Larger ρa may indicate that a user is a more proficient e-mail user; larger 〈Na〉/ρa may suggest that an individual has a larger attention span; Naa may be the time that an individual has to check e-mail before their next commitment.
Fig. 4.
Fig. 4.
Comparison of the predictions of the cascading nonhomogeneous Poisson process (red line) with the empirical cumulative distribution of inter-event times P(τ) (black line) for the same users from Fig. 3 (see SI Appendix for the same analysis for all 394 users). We use the area test statistic A (Eq. 2) and Monte Carlo hypothesis testing to calculate the P value between the model and the data (see Monte Carlo Hypothesis Testing in SI Text). As these figures are presented, the area test statistic A is the area between the two curves. Not only do the predictions of the cascading nonhomogeneous Poisson process visually agree with the empirical data, but the P values indicate that it cannot be rejected as a model of e-mail activity at a conservative 5% significance level.
Fig. 5.
Fig. 5.
Model comparisons. (A) Summary of the hypothesis-testing results for the cascading nonhomogeneous Poisson process and the truncated power-law null model for the 394 users under consideration. For each user, we compute the P value between their inter-event time distribution and the predictions of each model (see Monte Carlo Hypothesis Testing in SI Text). We reject a model for a particular user if the P value is less than the 5% rejection threshold (gray shaded region). At this significance level, the cascading nonhomogeneous Poisson process can be rejected for 1 user, whereas the truncated power-law null model can be rejected for 344 users (see Null Model in SI Text). Note that if the data were actually generated by one of the models tested, we would expect to see a uniform distribution of P values (dashed line). Because this is very nearly the case for the cascading nonhomogeneous Poisson process, this provides additional evidence that our model is consistent with the data. (B) Conditional probability density p(R∣τ) obtained for all 394 users under consideration. The average log-residual at each inter-event time is represented by the dashed line. In contrast to the results in Fig. 2E, we find no systematic deviations between the model predictions and the data in the tail of the inter-event time distribution where the power-law scaling approximately holds.

Similar articles

Cited by

References

    1. Smith A. An Inquiry into the Nature and Causes of the Wealth of Nations. London: Methuen; 1786.
    1. Pareto V. Manuale di Economia Politica. Milan: Societa Editrice; 1906.
    1. Zipf GK. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Cambridge, MA: Addison–Wesley; 1949.
    1. Stanley MHR, et al. Scaling behaviour in the growth of companies. Nature. 1996;379:804–806.
    1. Huberman BA, Pirolli PLT, Pitkow JE, Lukose RM. Strong regularities in world wide web surfing. Science. 1998;280:95–97. - PubMed

Publication types

LinkOut - more resources