Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 22;17(11):e0277182.
doi: 10.1371/journal.pone.0277182. eCollection 2022.

Structural invariants and semantic fingerprints in the "ego network" of words

Affiliations

Structural invariants and semantic fingerprints in the "ego network" of words

Kilian Ollivier et al. PLoS One. .

Abstract

Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our "bandwidth" for social interactions, humans organize their social relations according to a regular structure. In this work, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). Leveraging a methodology similar to the one used to uncover the well-established social cognitive constraints, we find regularities at both the structural and semantic levels. In the former, we find that a concentric layered structure (which we call ego network of words, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use. The size of the layers in this structure regularly grows (approximately 2-3 times with respect to the previous one) when moving outwards, and the two penultimate external layers consistently account for approximately 60% and 30% of the used words, irrespective of the number of layers of the user. For the semantic analysis, each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that ring #1 has a special role in the model. It is semantically the most dissimilar and the most diverse among the rings. We also show that the topics that are important in the innermost ring also have the characteristic of being predominant in each of the other rings, as well as in the entire ego network. In this respect, ring #1 can be seen as the semantic fingerprint of the ego network of words.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The ego network of social relationships.
The green dot symbolizes the ego and the black dots the alters with whom the ego maintains an active social relationship. A layer also contains the alters of the inner layers, unlike the rings.
Fig 2
Fig 2. Available timelines.
Number of selected timelines depending on the observation window.
Fig 3
Fig 3. Tweets per user.
Average number of tweets depending on the observation window. The Pearson linear correlation coefficient is equal to or greater than .98 for the four datasets.
Fig 4
Fig 4. Optimal number of clusters.
The clusters are obtained by applying Mean Shift to log-transformed frequencies. The most frequent number of clusters is highlighted in red.
Fig 5
Fig 5. Average layer size.
Each panel captures egos with a different optimal number of clusters. Error bars correspond to the 95% confidence intervals.
Fig 6
Fig 6. Scaling ratio.
Each panel captures egos with a different optimal number of clusters. Error bars correspond to the 95% confidence intervals.
Fig 7
Fig 7. Size of external layer vs individual layer size: Linear regression plots.
The x-axis corresponds to the total number of unique words used by each user (corresponding to the size of the outermost layer), the y-axis to the individual layer sizes.
Fig 8
Fig 8. Obtaining the semantic profile of the rings of an ego network.
(1) The ego network’s rings organize a user’s vocabulary based on the frequencies of the words. (2) For a given word, its occurrences in the user timeline are coming most likely from different tweets. (3) The tweets are classified by topic thanks to the BERTopic framework. (4) Each word occurrence is assigned the very same topic as the tweets it belongs to. (5) If we consider a ring as a multiset of words (with repetitions) the semantic profile is the distribution of the topics among those words.
Fig 9
Fig 9. 2D visualization of the HDBSCAN results on the journalists dataset with both hard and soft clustering.
265 clusters are found (they are the same in both cases). In the first case, each point is classified as either belonging to a single cluster (colored points) or as an outlier (grey point), whereas in the second case each point is assigned a likelihood to belong to each cluster (the points take the color of the cluster they belong to most likely).
Fig 10
Fig 10. Number of topics vs. average topic similarity.
The threshold of one hundred topics is marked with the dashed red line. This threshold is situated at the end of the bend for specialized datasets, and in the middle of the bend for both random datasets.
Fig 11
Fig 11. Semantic profile illustration.
Each ring is associated with a topic distribution. Note: Two different semantic profiles can be built, depending on whether topics are assigned using hard vs soft clustering. In S1 Appendix we show that the use of soft clustering (and thus the inclusion of outliers) does not improve the reliability of the analysis. It gives too much importance to noisy data which favors the emergence of very generalized “super topics” that dominate all semantic profiles. We, therefore, present in Section 5.3 only the results obtained with hard clustering. In S1 Appendix we discuss soft versus hard clustering in detail and motivate why hard clustering is better suited for our analysis.
Fig 12
Fig 12
Average number of topics (a), number of word occurrences (b), and normalised number of topics (c) in each ring of the ego network. For “null” ego networks, we report only the normalised number of topics (d).
Fig 13
Fig 13. Entropy of the semantic profiles per ring.
Real-life ego networks (left) vs null model ego networks (right).
Fig 14
Fig 14. Null model example.
The ring sizes and word occurrences are kept, the words are shuffled. In this toy example: O(e, r2) = 3 + 2, o(virus, e) = 5, o′(virus, e) = 1.
Fig 15
Fig 15. Jensen-Shannon distance.
Average JS distance between the rings.
Fig 16
Fig 16. Average strength of ring #1’s important topics in the semantic profile of each ring and of the whole ego network.
Each bar stands for the semantic profile of each ring (and overall ego network, in the last bar), where the blue part represents the share covered by the most important topics of ring #1 (their average number |Ur1| is written in white).
Fig 17
Fig 17. Average strength of the ego network’s important topics in the semantic profile of each ring.
The blue part of the stacked bar represents the share covered by the important topics in Ue. The average number of topics |Ue| is specified in white.

References

    1. Levelt WJ, Roelofs A, Meyer AS. A theory of lexical access in speech production. Behavioral and brain sciences. 1999;22(1):1–38. doi: 10.1017/S0140525X99001776 - DOI - PubMed
    1. Broadbent DE. Word-frequency effect and response bias. Psychological review. 1967;74(1):1. doi: 10.1037/h0024206 - DOI - PubMed
    1. Qu Q, Zhang Q, Damian MF. Tracking the time course of lexical access in orthographic production: An event-related potential study of word frequency effects in written picture naming. Brain and language. 2016;159:118–126. doi: 10.1016/j.bandl.2016.06.008 - DOI - PubMed
    1. Dunbar R. The social brain hypothesis. Evolutionary Anthropology. 1998;9(10):178–190. doi: 10.1002/(SICI)1520-6505(1998)6:5<178::AID-EVAN5>3.0.CO;2-8 - DOI
    1. Dunbar RIM, Sosis R. Optimising human community sizes. Evolution and human behavior: official journal of the Human Behavior and Evolution Society. 2018;39(1):106–111. doi: 10.1016/j.evolhumbehav.2017.11.001 - DOI - PMC - PubMed

Publication types