One plus one makes three (for social networks)
- PMID: 22493713
- PMCID: PMC3321038
- DOI: 10.1371/journal.pone.0034740
One plus one makes three (for social networks)
Erratum in
- PLoS One. 2012:7(4): doi/10.1371/annotation/c2a07195-0843-4d98-a220-b1c5b77a7e1a. Horvát, Emöke-Ágnes [corrected to Horvát, Emőke-Ágnes]
Abstract
Members of social network platforms often choose to reveal private information, and thus sacrifice some of their privacy, in exchange for the manifold opportunities and amenities offered by such platforms. In this article, we show that the seemingly innocuous combination of knowledge of confirmed contacts between members on the one hand and their email contacts to non-members on the other hand provides enough information to deduce a substantial proportion of relationships between non-members. Using machine learning we achieve an area under the (receiver operating characteristic) curve (AUC) of at least 0.85 for predicting whether two non-members known by the same member are connected or not, even for conservative estimates of the overall proportion of members, and the proportion of members disclosing their contacts.
Conflict of interest statement
Figures
(black nodes) and of non-members
. In our toy example
of
individuals, i.e. a fraction of
, are members. The relevant subset
of non-members (red nodes) that are in contact with at least one member is distinguished from other non-members (gray nodes).
of the
members, i.e., a fraction of
, have disclosed their outside social contacts. The knowledge of the set of edges
between members (black, bi-directed) and the set of edges
(green) to non-members is enough to infer a substantial fraction of edges between non-members (red edges).
show the nodes from which the propagation started (black nodes with white core). Other members are marked black and relevant non-members red; for ease of reading arrows are not displayed, but black edges are bidirectional while green edges point from black to red nodes. With BFS and DFS the network is explored starting from one node (denoted by a white circle); with RW and EN there are more nodes from which the propagation is launched; and finally, for RS all selected nodes can be seen as starting nodes.
and
.
is exclusive to
, while
are exclusive to
, and
are common neighbors of both. Our features comprise the absolute number of edges between common neighbors (black, dashed edges), exclusive neighbors (black, straight edge), joint neighborhood (all black edges between nodes
), and an exclusive and a common neighbor (black, dotted edges). For each of them we also added their normalized value. Normalization was done by the number of possible edges between the neighbors they have.
and the disclosure parameter
. Upper row:
; lower row:
; black triangles denote data points where
was smaller than the according fraction of positive samples among all samples.
values for each of the five member recruitment models at
. The
and
-axis show on which network the random forest was trained and tested, respectively. The white field indicates that there were too few edge samples to reasonably train the classifier.References
-
- Jernigan C, Mistree B. Gaydar: Facebook friendships expose sexual orientation. First Monday [Online] 2009;14
-
- Lindamood J, Heatherly R, Kantarcioglu M, Thuraisingham B. Inferring private information using social network data. Proceedings of the 18th International Conference on World Wide Web (WWW ’09) 2009. pp. 1145–1146.
-
- Mislove A, Viswanath B, Gummadi KP, Druschel P. You are who you know: inferring user profiles in online social networks. Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM ’10) 2010. pp. 251–260.
-
- Zheleva E, Getoor L. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. Proceedings of the 18th International Conference on World Wide Web (WWW ’09) 2009. pp. 531–540.
-
- Getoor L, Diehl CP. Link mining: a survey. ACM SIGKDD Explorations Newsletter. 2005;7:3–12.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
