Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 15:54:102032.
doi: 10.1016/j.ijdrr.2020.102032. Epub 2021 Jan 11.

A multi-modal approach towards mining social media data during natural disasters - a case study of Hurricane Irma

Affiliations

A multi-modal approach towards mining social media data during natural disasters - a case study of Hurricane Irma

Somya D Mohanty et al. Int J Disaster Risk Reduct. .

Abstract

Streaming social media provides a real-time glimpse of extreme weather impacts. However, the volume of streaming data makes mining information a challenge for emergency managers, policy makers, and disciplinary scientists. Here we explore the effectiveness of data learned approaches to mine and filter information from streaming social media data from Hurricane Irma's landfall in Florida, USA. We use 54,383 Twitter messages (out of 784K geolocated messages) from 16,598 users from Sept. 10 - 12, 2017 to develop 4 independent models to filter data for relevance: 1) a geospatial model based on forcing conditions at the place and time of each tweet, 2) an image classification model for tweets that include images, 3) a user model to predict the reliability of the tweeter, and 4) a text model to determine if the text is related to Hurricane Irma. All four models are independently tested, and can be combined to quickly filter and visualize tweets based on user-defined thresholds for each submodel. We envision that this type of filtering and visualization routine can be useful as a base model for data capture from noisy sources such as Twitter. The data can then be subsequently used by policy makers, environmental managers, emergency managers, and domain scientists interested in finding tweets with specific attributes to use during different stages of the disaster (e.g., preparedness, response, and recovery), or for detailed research.

Keywords: data mining; machine learning; natural disaster; social media.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1:
Figure 1:
The path of Hurricane Irma in September 2017 (orange line), the extent of tropical storm force winds (pink outline), and the location for all 784K geolocated tweets used as the basis for this study (black dots).
Figure 2:
Figure 2:
Overall information flow model. The Metadata Extraction stage develops variables from the raw Twitter data, Filtering stage utilizes the developed 1) Geospatial, 2) User, 3) Image, and 4) Text analysis modules to score tweets, and the Visualization stage is used to observe the at location posted image along with Google Street View
Figure 3:
Figure 3:
Precipitation and wind speed in relation to the distance from Hurricane Irma’s eye.
Figure 4:
Figure 4:
Cumulative Distribution Function (CDF) for Min-Max Normalization, Log, and Box-Cox transformed geospatial scores for the nine models. The common legend of all three figures is shown in Figure c.
Figure 5:
Figure 5:
Cumulative Distribution Function (CDF) and F1-Scores for the top five geospatial models.
Figure 6:
Figure 6:
Area Under - Receiver Operating Characteristics (AU-ROC) Curves for V3, VGG net, ResNet architecture and Tuned Inception V3 models for binary classification of images (hurricane related versus non-hurricane related).
Figure 7:
Figure 7:
Area Under - Receiver Operating Characteristics (AU-ROC) Curves for Tuned Inception V3 model for multi-label annotation for images - 1) ‘Flood’, 2) ‘Wind’, and 3) ‘Destruction’.
Figure 8:
Figure 8:
AU-ROC Curves for Random Forest, Gradient Boosted, and Logistic Regression Classifiers in predicting Verified users.
Figure 9:
Figure 9:
AU-ROC Curves for text — 1) Cosine Similarity of Tweet Vector Sum (CSTVS), 2) Dot Product of Search Term Vector and Tweet Vector Sum (DP), 3) Mean Cosine Similarity (MCS), 4) Sum of Cosine Similarity over Square Root of Token Count (SCSSC).
Figure 10:
Figure 10:
Performance of the user Random Forest Classifier in the binary classes and feature importance metrics.
Figure 11:
Figure 11:
CDF of Overall Model and percentage of tweets passing different model thresholds.

References

    1. Knutson T, Camargo SJ, Chan JCL, Emanuel K, Ho C-H, Kossin J, Mohapatra M, Satoh M, Sugi M, Walsh K, Wu L, Tropical Cyclones and Climate Change Assessment: Part II. Projected Response to Anthropogenic Warming, Bulletin of the American Meteorological Societydoi:10.1175/BAMS-D-18-0194.1. URL https://journals.ametsoc.org/doi/10.1175/BAMS-D-18-0194.1 - DOI - DOI
    1. Moftakhari HR, AghaKouchak A, Sanders BF, Feldman DL, Sweet W, Matthew RA, Luke A, Increased nuisance flooding along the coasts of the United States due to sea level rise: Past and future, Geophysical Research Letters 42 (22) (2015) 9846–9852. doi:10.1002/2015GL066072. URL https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1002/2015GL066072 - DOI - DOI
    1. Neumann B, Vafeidis AT, Zimmermann J, Nicholls RJ, Future Coastal Population Growth and Exposure to Sea-Level Rise and Coastal Flooding - A Global Assessment, PLOS ONE 10 (3) (2015) e0118571. doi:10.1371/journal.pone.0118571. URL https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118571 - DOI - PMC - PubMed
    1. Lazarus ED, Limber PW, Goldstein EB, Dodd R, Armstrong SB, Building back bigger in hurricane strike zones, Nature Sustainability 1 (12) (2018) 759–762. doi:10.1038/s41893-018-0185-y. URL https://www.nature.com/articles/s41893-018-0185-y - DOI
    1. De Longueville B, Smith R, Luraschi G, “OMG, from here, I can see the flames!”: a use case of mining location based social networks to acquire spatio-temporal data on forest fires, 2009, pp. 73–80. doi: 10.1145/1629890.1629907. - DOI

LinkOut - more resources