A friend recently described the vibe of my recent newsletters as “a new thing has happened online, and I am worried”. Well sorry everyone, but there’s another thing to worry about. It’s this email that I, and everyone else registered as a Twitter developer, received last month:
I immediately posted this to a group chat of social media researchers with the caption ‘get f***ed Twitter’.
‘Are we done, then?’ asked a fellow researcher.
But what was so bad about this email (besides the horrendously Silicon Valley language)? For that, I need to take you into some of the secrets* of social media research. Read on to find out more.
Next time, though, I promise to write about something less worrysome.
*Not actually secrets.
Unpacking The Internet’s Boxes
Imagine you’re an historian going through boxes of old papers. The boxes are labelled, but not in a super helpful way – there’s no central list of labels, and sometimes people have changed labelling practices halfway through. There’s so much material that even a whole team of people would never be able to read it all. Also some of the papers mention boxes that you haven’t got access to. Also some of the boxes contain harmful material, and you should really try and find it quickly. Oh and also the stuff inside the boxes is multiplying a hundredfold every second, and sometimes random new boxes spawn out of nowhere. (The latter is something historians have, to my knowledge, not had to deal with).
That’s what being an internet researcher can sometimes feel like. Obviously searching for online material is a lot easier than travelling to an archive and reading through papers (God bless search engines and Ctrl-F). But, like a poorly-labelled archive, websites can be surprisingly complicated. To see for yourself, try going to a web page, right click, choose ‘view source’, and you can see the intimidating mess of material which sits behind a web page. Once you’ve figured out how it works, you can make a programme to collect data from the page; these are known as ‘scrapers’. But in all the mess there’s likely to be bugs; also the people who run the web pages might change things and break your programme, or they may actually forbid scraping. Now imagine doing all that for multiple web pages. While they multiply and maybe help people do bad things. It’s slightly scary.
(And all that’s ignoring how much online activity happen behind sign-in screens, paywalls, apps, etc., known as The Deep Web)
Fortunately, there’s sometimes an equivalent of a helpful librarian - someone you can ask ‘I want this, how to I get it?’. These are called Application Programming Interfaces, or APIs. A well-maintained API has a nice clear list of terms you can use in code to make requests of Twitter, or reddit, or a government data store, or whatever.
(You can also use APIs to do things that aren’t about research, e.g. build a bot or autosave your ‘liked’ content in a spreadsheet. But let’s focus on research).
APIs makes data collection much quicker, cleaner, and more stable. They allow us to focus on actually doing analysis, and keeping up with all the fast-moving (and sometimes very bad) things happening on the internet. Boy it’d be a shame if something were to happen to them.
Get F***ed, Twitter
So what were the API updates that Twitter was “so excited to share” in their email. The answer: They were closing their free API. With a less than a week’s notice. At a time when plenty of research (including this, that I helped CASM Technology and the Institute for Strategic Dialogue do for BBC Panorama) show that harmful accounts are rapidly multiplying on Twitter. Over a month later we’re still not quite sure what’s happening – it’s still free access for now – but the figure of a ‘small package’ starting at $42,000 per month has been mentioned.
(The exact price is probably a reference to the ‘420’ being an online meme for drugs – Musk allegedly paid $160 million extra for Twitter in order to get that number into the share price of $54.20).
This is a severe blow for social media research. Until recently, Twitter had a very open API and you could do all sorts of studies of how content was being shared, cool network graphs of who follows who, large-scale language analysis… lots of cool stuff. Arguably too cool; I and others have argued many times that Twitter gets disproportionate levels of attention (remember that it’s got a similar number of users as reddit, and is far smaller than Facebook, YouTube, etc.). The variety of research Twitter made possible probably contributed to that. Nonetheless, it helped us expose and understand all sorts of worrying networks and narratives and behaviours, which can then help us address problems of the web more broadly.
The Twitter API closure is (like a lot of Musk’s tenure) sudden and unnecessarily chaotic – more on that later. But it’s also part of a longer trend of platforms becoming harder to research through APIs. Following privacy scandals, Facebook (later Meta) gradually restricted API access to Facebook and Instagram data. Eventually they required that you go via a tool called Crowdtangle, a previously independent research tool which they had bought in 2016. They then restricted Crowdtangle’s use to publishers and “academics and researchers in specific fields”. Then it got gradually buggier and harder to use. In 2022 reporting confirmed the suspicions of many researchers, that Crowdtangle was being starved of support, wound down, and destined for closure.
Newer platforms tend to appear with API limits already baked-in. For instance Telegram has an API, but it can’t access groups set as ‘private’. That sounds sensible until you realise that so-called ‘private’ groups can sometimes have thousands of members - so it’s basically a free pass for various dubious communities on Telegram. TikTok doesn’t yet have an API –last year they excitingly revealed they’re introducing one, but it will be limited to just US universities (initially at least). Of the remaining major platforms, reddit’s API is very open and YouTube’s is pretty ok. But overall we’re moving towards a world where external researchers can’t see what’s happening, where we’ll be increasingly reliant on those buggy and hard-to-scale old-school methods.
(For some work I did with CASM Technology and the Institute for Strategic Dialogue on how these issues might affect research into extremist communities online, see here ).
Why is all this happening?
APIs arguably used to be too permissive. There was a golden age of APIs about 2010-2016 (I was lucky to be doing a PhD at that time). I remember being concerned at how easily I could get information about strangers from Facebook. I wasn’t alone; in 2014 I was at a presentation by David Stillwell describing his research using a Facebook app called MyPersonality. He and Michael Kosinski had found, contrary to some expectations, that Facebook ‘likes’ could be accurate predictors of various psychological traits. Stillwell was clearly concerned about some of the worrying implications of these findings. He had already refused advances from a company called SCL on ethical grounds. That company later became known as Cambridge Analytica. That’s the main one of the aforementioned privacy scandals that kicked off Facebook gradually closing down API access.
Or rather, that’s a story platforms sometimes tell – this is about protecting users’ privacy. There are trade-offs that moral actors need to make between preserving privacy and addressing harmful stuff which happens online. And, to be fair, data protection laws like the GDPR can sometimes be unclear about data access requirements. But that’s a solvable problem – the EU and other regulators, are, on the whole, not keen to stop research or limit transparency. And platforms seem happy to be quite lax about privacy when it makes them money (interestingly, the most impactful limit to platforms use of personal data seems to have been a technical privacy restriction imposed by Apple rather than any sort of legal or moral pressures). Whatever the case, completely turning off a long-running research platform, or making your API ridiculously expensive, does not seem genuinely motivated by any moral dilemma over privacy.
Platforms also offer less high-minded reasons than privacy for restricting API access. Musk claims it’s to stop people using it to make bots and spam (his usual excuse for anything Twitter-related); but he could still allow researchers to collect Tweets, and just restrict the ability of people to use it to create Tweets. His other usual excuse, that Twitter needs to make money, is perhaps more credible (though maybe not scaring away advertisers from a platform that largely makes its revenue through advertising might be a better plan). APIs do cost some money to run; and also they can allow people to use data from your product to make money for their product, in a way which might even steal custom from you. Dick Costello, who from 2010-2015 was Twitter’s third CEO (of five, or six if you count Jack Dorsey’s two tenures) has argued that not cracking down on these ‘third party clients’ was a big mistake, and Musk recently (and again, suddenly) cut off their access to Twitter; here’s a deeply impassioned post from someone affected by that. On the other hand, the excellent tech business analyst Ben Thomson has argued that most of the best Twitter innovations – including the word ‘Tweet’, for goodness sake – came from these third party clients.
Thomson’s argument is part of a broader moral case against the platforms. They can argue they are private companies, and not obliged to provide their data free of charge to everyone. But platforms are also part of ecosystems; like many businesses they benefit from innovations made elsewhere, and as well as their own benefits they bring negative externalities. And they’re making it harder to address those externalities. As fellow social media researcher Philipp Lorenz-Spreen puts it “We do not depend on the oil industry to be able to measure CO2, but we are dependent on Facebook to measure polarization on Facebook.”
A Poke in the I
Be in no doubt, social media platforms cannot be trusted to do this research themselves. Even when they publish their own data, it can’t be trusted. In 2015, Facebook used data on video viewing figures to encourage publishers to make more videos; as many argued, including The Onion, it should have been obvious no-one except advertisers wanted articles to be replaced by videos. But numbers presented by Facebook’s execs were powerful. They were also massively inflated, but this was only discovered after a lot of changed media strategies and layoffs. More recently we’ve seen doubts around Twitter’s view counts figure; and Emily Baker-White broke news of TikTok secretly ‘heating’ videos (i.e. manually make them go viral) in order to court influencers.
One could, of course, question if their APIs are to be trusted. But at least with more data comes more opportunity to scrutinise them. Most famously, the journalist Kevin Roose used Crowdtangle* to demonstrate that the sites getting the most engagement on Facebook were those which spread polarising (and often incorrect) right-wing messages. Facebook, somewhat fairly, argued that was showing how highly engaged people use Facebook, not ‘normal people. That’s a good example arguing with evidence. They then, possibly relatedly, made Crowdtangle rubbish. That isn’t.
* This article is worth reading to see many of the themes of this newsletter through the lens of Facebook internal culture fights.
There are other businesses which (i) can cause social risk and (ii) have a lot of internal data which can help uncover that risk. The finance industry springs to mind. Similarly to tech platforms, they have tried to restrict how much scrutiny they can be put under. The CEO of the now-famous Silicon Valley Bank had previously lobbied to limit banking regulation, presumably thinking they would be safe without it. And, well, look how safe they turned out to be. So maybe this is a feature of high-tech capitalism. But what was exciting in the younger days of tech platforms was it felt they brought a kind of internet-native hacker ethos, of open data and transparency – which is now being replaced by cold hard self-centredness.
Like the finance industry, some regulators – particularly the EU – are beginning to circle platforms. Laws like the Digital Services Act, and other agreements like the EU Code of Conduct on Disinformation, are pushing platforms to share data with vetted researchers. But I think it’s a good bet that will come with all sorts of delays, lots of fights over what terms mean (with platforms trying to be as restrictive as possible), and researchers having to narrow and focus the scope of research onto areas of ‘likely harm’; despite, sometimes, problems being in surprising places. While in the meantime, platforms can carry on doing deep research into their internal data, but with no guarantee they’ll use the results to make society better.
And there are complicated debates to be had. As Daphne Keller says in her very good blog on US transparency regulation, there is an “an irreducible problem: We cannot have both optimal research and optimal privacy”. There are also other problems about increased data access potentially helping malicious hacking, vexatious litigation, even government overreach when trying to regulate speech. Also providing data access does create work for platforms, particularly smaller ones. Fine. But the experiences of Crowdtangle and Twitter do not feel like this complex balancing act is in any way being conducted in good faith by platforms. Instead it feels like a constant struggle to get platforms to do something good – or even just something not actively bad.
So to sum up my mood as a social media researcher. Movements towards something better come slowly, and with no guarantee that good ideas will actually work in practice. But movements towards worse things come suddenly, unpredictably, and disruptively. Despite originally giving less than a week’s notice for the free API closure, it’s now been over a month and the free API is still available. Great. But whenever there’s a Twitter bug, I now think – is The Bad Thing happening now? We still don’t know what’s happening with Crowdtangle either. There are projects I don’t know whether it’s worth beginning. I’m sure there are a fair few PhD’s in suspended animation right now.
In the meantime, Twitter can get f**ed, we need to keep fighting for the boxes of the internet to not be sealed off, I’m off to use an API to crowdsource ideas for less worrysome things to write about next time.
Fun Fact About: The Good Friday Agreement
This week was St Patrick’s Day, and next month sees the 25th anniversary of the Good Friday Agreement. Alastair Campbell – for all my misgivings about his popular The Rest Is Politics Podcast – records the path to the Agreement very well in the second volume of his diaries.* It’s stuffed with loads of great stories, but I think my favourite is arguments over the shape of the negotiating table and the optics thereof.
The Republicans wanted all parties to be sat on the same side, looking like friends; the Unionists wanted the parties to be opposite each other, as adversaries. The deadlock was resolved when a junior civil servant suggested a diamond-shaped table. The civil servant who allegedly suggested this idea, Robert Hannigan, later went on to be Director of GCHQ.
* I find this particularly impressive given I struggle to keep a diary at even slightly busy times. I have one diary entry from late 2019 that reads ‘since I last wrote in this we have changed Prime Minister, had a general election, and I’ve left 10 Downing Street. There is a lot to catch up on.’
Recommendations
News-reading app: The founders of Instagram have released a news recommender app called Artifact. The idea is it uses AI to learn what topics you like to read – including really niche ones – and gives you articles about those topics. I’ve been trying it and I find it ok at showing me tech news that I might have missed; but I find the range of sources frustratingly narrow, and if I had to pick one I’d stick with Feedly. But in combination, they’re quite good (in both cases, it’s important to stick with them for a bit). Also the founders gave a couple of interesting interviews to Hard Fork and Stratechery about AI recommendations, news, and being second-time founders.
Berlin history: My photographer friend Michael Berger took me on a walking tour of Karl Marx Allee this morning. I’ve walked along it many times – indeed I used to live near it – but I never appreciated just how historically rich it is; it was a symbol of Soviet planning, with all the complications that brought with it over the period of the East German Republic. This blog is not quite a replacement for Michael’s excellent tour, but probably much easier for you to access.
Comedy: If you are aware of the sitcom Community, which ran 2009-2015 and launched careers including Donald Glover Jr and Alison Brie – and was a major part of my lockdown experience - you’ll probably be aware of the associated meme ‘six seasons and a movie’ (itself an in-joke from the show). Well, 8 years later, they are apparently doing that move. I’ve no idea if it’ll be good. But in the meantime, if you haven’t already seen the brilliant, well written and acted, and often thought-provoking series I thoroughly recommend it. It parodied the rise of social media in 2014, a full 2 years before the famous Black Mirror episode. And you get to see, amongst other things, Jim Rash - who later won an Oscar for screenwriting The Descendants – discover his rap powers while dressed as a peanut butter bar.
Thanks for reading. Please do share with others using this link, and let me know your thoughts via this short poll.
Comments