Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 25;9(4):e95978.
doi: 10.1371/journal.pone.0095978. eCollection 2014.

Measuring large-scale social networks with high resolution

Affiliations

Measuring large-scale social networks with high resolution

Arkadiusz Stopczynski et al. PLoS One. .

Abstract

This paper describes the deployment of a large-scale study designed to measure human interactions across a variety of communication channels, with high temporal resolution and spanning multiple years-the Copenhagen Networks Study. Specifically, we collect data on face-to-face interactions, telecommunication, social networks, location, and background information (personality, demographics, health, politics) for a densely connected population of 1000 individuals, using state-of-the-art smartphones as social sensors. Here we provide an overview of the related work and describe the motivation and research agenda driving the study. Additionally, the paper details the data-types measured, and the technical infrastructure in terms of both backend and phone software, as well as an outline of the deployment procedures. We document the participant privacy procedures and their underlying principles. The paper is concluded with early results from data analysis, illustrating the importance of multi-channel high-resolution approach to data collection.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Dynamics of face-to-face interactions in the 2012 deployment.
The participants meet in the morning, attend classes within four different study lines, and interact across majors in the evening. Edges are colored according to the frequency of observation, ranging from low (blue) to high (red). With 24 possible observations per hour, the color thresholds are respectively: blue (formula image observations formula image), purple (formula image observations formula image), and red (formula image observations). Node size is linearly scaled according to degree.
Figure 2
Figure 2. Sensible Data openPDS architecture.
This system is used in the 2013 deployment and consists of three layers: platform, services, and applications. The platform contains element common for multiple services (in this context: studies). The studies are the deployments of particular data collection efforts. The applications are OAuth2 clients to studies and can submit and access data, based on user authorizations.
Figure 3
Figure 3. Authorizations page.
Participants have an overview of the studies in which they are enrolled and which applications are able to submit to and access their data. This is an important step towards users' understanding what happens with their data and to exercising control over it. This figure shows a translated version of the original page that participants saw in Danish.
Figure 4
Figure 4. Data viewer application.
All the collected data can be explored and accessed via an API. The API is the same for research, application, and end-user access, the endpoints are protected by OAuth2 bearer token. Map image from USGS National Map Viewer, replacing original image used in the deployed application (Google Maps).
Figure 5
Figure 5. Weekly temporal dynamics of interactions.
Face-to-face interaction patterns of participants in 5-minute time-bins over two weeks. Only active participants are included, i.e. those that have either observed another person or themselves been observed in a given time-bin. On average we observed formula image edges and formula image nodes in 5-minute time-bins and registered 10 634 unique links between participants.
Figure 6
Figure 6. Face-to-face network properties at different resolution levels.
Distributions are calculated by aggregating sub-distributions across temporal window. Differences in rescaled distributions suggest that social dynamics unfold on multiple timescales.
Figure 7
Figure 7. WiFi similarity measures.
Positive predictive value (precision, ratio of number of true positives to number of positive calls, marked with dashed lines) and recall (sensitivity, fraction of retrieved positives, marked with solid lines) as functions of parameters in different similarity measures. A) In 98% of face-to-face meetings derived from Bluetooth, the two devices also sensed at least one common access point. D) Identical strongest access point for two separate mobile devices is a strong indication of a face-to-face meeting.
Figure 8
Figure 8. Location and Mobility.
We show the accuracy of the collected samples, radius of gyration of the participants, and identify patterns of collective mobility.
Figure 9
Figure 9. Diversity of communication logs.
Diversity is estimated as the set of unique numbers that a person has contacted or been contacted by in the given time period on a given channel. We note a strong correlation in diversity (Pearson correlation of formula image, formula image), whereas the similarity of the sets of nodes is fairly low (on average formula image).
Figure 10
Figure 10. Weekly temporal dynamics of interactions.
All calls and SMS, both incoming and outgoing, were calculated over the entire dataset and averaged per participant and per week, showing the mean number of interactions participants had in a given weekly bin. Light gray denotes 5pm, the time when lectures end at the university, dark gray covers night between 12 midnight and 8am. SMS is used more for communication outside regular business hours.
Figure 11
Figure 11. Daily activations in three networks.
One day (Friday) in a network showing how different views are produced by observing different channels.
Figure 12
Figure 12. Face-to-face and online activity.
The figure shows data from the 2013 deployment for one representative week. Online: Interactions (messages, wall posts, photos, etc.) between participants on Facebook. Face-to-Face: Only the most active edges, which account for formula image of all traffic, are shown for clarity. Extra Info. F2F: Extra information contained in the Bluetooth data shown as the difference in the set of edges. Extra Info. Online: Additional information contained in the Facebook data.
Figure 13
Figure 13. Network similarity.
Defined as the fraction of ties from one communication channel that can be recovered by considering the top formula image fraction of edges from a different channel. Orange dashed line indicates the maximum fraction of ties the network accounts for. The strongest formula image of face-to-face interactions account for formula image of online ties and formula image of call ties, while formula image of Facebook ties and formula image of call ties are not contained in the Bluetooth data. Between call and Facebook, the formula image strongest call ties account for formula image while in total formula image of Facebook ties are unaccounted. All values are calculated for interactions that took place in January 2014.
Figure 14
Figure 14. Personality traits.
Violin plot of personality traits. Summary statistics are: openness formula image, formula image; extraversion formula image, formula image; neuroticism formula image formula image; agreeablenes formula image formula image; conscientiousness formula image formula image. Mean values from our deployment (red circles) compared with mean values reported for Western Europe (mixed student and general population) (orange diamonds).
Figure 15
Figure 15. Correlation between personality traits and communication.
Data from the 2013 deployment for N = 488 participants, showing communication only with other study participants. Extraversion, the only significant feature across all networks is plotted. The red line indicates mean value within personality trait. Random spikes are due to small number of participants with extreme values. E) Pearson correlation between Big Five Inventory personality traits and number of Facebook friends formula image, volume of interactions with these friends formula image, number of friends contacted via voice calls formula image and via SMS formula image. *: formula image, **: formula image, ***: formula image.

References

    1. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, et al. (2008) Detecting influenza epidemics using search engine query data. Nature 457: 1012–1014. - PubMed
    1. Aral S, Walker D (2012) Identifying influential and susceptible members of social networks. Science 337: 337–341. - PubMed
    1. Stopczynski A, Pietri R, Pentland A, Lazer D, Lehmann S (2014) Privacy in Sensor-Driven Human Data Collection: A Guide for Practitioners. arXiv preprint arXiv:14035299.
    1. Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D, et al. (2007) Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences 104: 7332–7336. - PMC - PubMed
    1. Cha M, Haddadi H, Benevenuto F, Gummadi PK (2010) Measuring user influence in Twitter: The million follower fallacy. ICWSM 10: 10–17.

Publication types