Published on Snurblog (http://snurb.info)

Home > The Limitations of Twitter as a Data Source

The Limitations of Twitter as a Data Source

Mon, 28/05/2018 - 22:16 — Snurb
'Big Data' [1]
Social Media [2]
Twitter [3]
ICA 2018 [4]

The next speaker in this ICA 2018 [5] session is Fabian Pfaffenberger, who also highlights the unreliability of Twitter data. The API’s 1% sample is extremely biased, and the search API is also unreliable in what it delivers; historical data is especially incomplete as the search API delivers only tweets posted in the past 6-7 days and will not include deleted tweets or tweets from subsequently deleted or suspended accounts.

User information is also incomplete, and geodata is largely unreliable and limited to some 1% of all tweets. Further, genuine users are mixed with bots in the datasets – better bot identification tools are sorely needed. And whatever we encounter may not be representative in any meaningful way – Twitter is already a niche medium, and Twitter users may be especially interested in engaging with leading users. Its userbase appears to be stagnating at this stage.

[Creative Commons Attribution-NonCommercial-ShareAlike 2.0 License]
Except where otherwise noted, this work is licensed under a Creative Commons License. -->

Source URL:http://snurb.info/node/2355

Links
[1] http://snurb.info/taxonomy/term/142 [2] http://snurb.info/taxonomy/term/125 [3] http://snurb.info/taxonomy/term/121 [4] http://snurb.info/taxonomy/term/168 [5] https://www.icahdq.org/general/custom.asp?page=Conference