A team of researchers at the Max Planck Institute and Northeastern University may have figured out how to beat ad discrimination on Facebook once and for all. After a string of alarming ProPublica stories, Facebook has had trouble keeping advertisers from targeting ads to specific races, potentially violating discrimination laws around housing and employment. But the Planck institute team has a new idea for how to approach the problem, judging an ad based on the overall users targeted instead of the inputs used to achieve that target. The researchers laid out how such a system might work in a paper last month, the first step in a project that could rewrite the way ads are targeted online.
But with the basic idea laid out, the team has run into a bigger problem: Facebook itself. The next step should be find out how many Facebook ads might end up blocked by the new anti-discrimination system, but Facebook’s data policy makes that question impossible to answer. Anyone who buys an ad can see the targeting tools, but only Facebook knows how people are actually using them.
“We just don’t know how the ad-targeting system is being used,” says Northeastern University professor Alan Mislove, who co-authored the paper. “Only Facebook has that data, which is an unfortunate situation.”
That lack of data blocks research into some of Facebook’s biggest problems, going far beyond just ads. For critics, Facebook is spreading conspiracy theories, enabling Russian influence campaigns, and actively undermining democracy. But despite the mounting concerns, there’s very little data detailing how serious those problems really are or how they arise. We just don’t know how Facebook affects society at large — and the company’s data lockdown makes it impossible to find out.
In some ways, this lack of public information is part of Facebook’s promise to users. It’s easy to research Twitter, where every post is public, but the same accessibility opens the door to more aggressive doxxing and harassment. The vast majority of Facebook, on the other hand, is hidden from public search, which makes it much harder for anyone to know what’s happening. Researchers can see public groups and interest pages, but without a linked person’s login token, they can’t break through to private profiles or groups, making it difficult to trace influence campaigns or misinformation.
As a result, when researchers want to find out how fake news spreads or conversations get derailed, they turn to Twitter. Open-source researchers like Jonathan Albright are able to track how troll networks seize on specific outlets and stories on Twitter. But on Facebook, such analysis is simply impossible.
“Even if you can get the data, you are left without the necessary means to fully understand it because of the closed (proprietary) and constantly changing News Feed algorithm,” Albright says. “The underlying platform data is simply not accessible.”
When analysts do look at Facebook, it tends to be the easily accessible parts. Reporters have focused on the Trending Topics board largely because it’s visible and accessible, even if it doesn’t drive significant traffic or user interest. Other reporting has employed on tools like CrowdTangle, which can show share volume from specific publishers (particularly useful for profiling fake news outlets), but gives little sense of how stories are spreading across publishers. As a result, reporters can spot misinformation when it goes viral, but even the most sophisticated tools can’t tell them how it got so popular.
That’s particularly urgent because Facebook is facing serious questions about its impact on society, and we have no data to tell us which concerns are important. Earlier this week, UN officials said Facebook played a role in a possible genocide against Rohingya Muslims in Myanmar — a horrifying charge, if true. It would be immensely valuable to track how anti-Rohingya sentiment actually spread on the platform. The results might exonerate Facebook or point toward specific changes the platform could make to address the problem. As it stands, we simply have nowhere to begin.
The same problem is in play when researchers go looking for bots or otherwise fraudulent accounts. There are more than 20 million dummy accounts on Facebook, and while the company prefers to keep bot-hunting efforts internal, a small industry has grown up for researchers who can reliably spot and report the accounts. Facebook does accept reports from those researchers, but it doesn’t make it easy for those researchers to find connected accounts or report them, often requiring an in-person contact with an individual support team member.
One bot-hunter, who asked not to be named because of his ongoing work with Facebook, said person-to-person reporting made his work far more difficult. “The challenge with this approach is that individual support team members may have slightly different interpretations of what constitutes abusive content,” the researcher said, “and larger campaigns consisting of hundreds or thousands of abusive posts / profiles are difficult to manually report at scale.”
Reached by The Verge, a Facebook representative pointed to the company’s ongoing bug bounty program, which Mislove has worked with before, as an example of collaboration with outside researchers, and said the company is eager to find new ways to work with researchers provided the work doesn’t compromise user privacy. “We value our work with the research community and are always exploring ways to share and learn more from them,” the company said in a statement.
After a wave of ad scrutiny, Facebook is already testing out more ad disclosure pages that would show every post from a given advertiser, scheduled to roll out in the US before the 2018 elections. It’s significantly more data than you can get from systems like AdWords, although it still wouldn’t give much detail on how the ads are targeted. For researchers like Mislove, that leaves the most important questions still unanswered.
“What I need to know is the targeting parameters,” he says. “Without that, we either need to trust Facebook, or we need better tools to get some visibility into how their ad system is being used.”