What’s in a Name?

A MAN IN A frock held up his right hand and said a few words, and I got to fill out some paperwork. It is strange that this process makes me feel better about myself. I abhor paperwork.

Rites of Passage have been sociologically and psychologically important to humans for, well, as long as recorded history. As someone who grew up without cultural roots (or rather, with roots in several different major world cultures) and never identified with any specific culture, I’ve often been a little jealous of folks who get to have a bar/batmitzva, a big ol’ denominational wedding, or the like. I never had those. Heck, I didn’t even have a graduation ceremony!

So a few years ago I started making my own sacred rites, celebrating the milestones of life in my own way. Fuck society and fuck your burn — by which I mean, you should make your own rules, too.

In 2017, I threw myself an unwedding (complete with a tiered cake) to mark the end of one era of my life and celebrate the formation of so many new connections to my chosen family.

2017 was also the year I started to come out. To present. To be present.

I wonder how I’ll look back on this day and celebrate it next year….

The Politics of Machine Learning

Why would anyone care enough to launch a distributed effort targeting children’s videos on YouTube? I think it’s politically motivated, and part of a larger effort to force internet platforms to adopt more strict restrictions on online expression.

Your Comments ran over My Videos

Five days ago, I was sitting in the balcony of the Castro Theater in San Francisco, attending the Lesbians Who Tech summit, in complete awe as Kara Swisher and Susan Wojcicki met on the keynote stage for a candid conversation. These two seemed to have a friendly rapport, but this discussion was tense! Honestly, I’ve never seen a keynote stage so tense – Wojcicki appeared to be sweating, uncomfortably dodging questions, unprepared or unwilling to answer clearly in some cases. This has been written about elsewhere and the discussion was captured by CNBC, with this exchange starting around 1:33 in the full video.

The topic at hand: why had YouTube just purged over 400 channels, hundreds of millions of comments, and globally disabled comments by default on any video it determines is related to children?

“You have all reaped the benefits, but not [accepted] the responsibility, of having this platform.”

Kara Swisher

The short answer is somewhere between “YouTube was losing ad dollars” and “protect the children!” The backstory is far more complicated and has implications that are sweeping across our society, and, I believe, about to land on Capital Hill.

To explain this, I need to talk about a pair of bills passed in 2018 called SESTA/FOSTA. I am going to talk about the widespread use of AI & Machine Learning, and the use of bots to interfere in US democracy. Bear with me, this is gonna be a journey!

At Its Heart

Kara Swisher told a story of how hate speech became a dinner-time topic when her son, in about three clicks, went from Ben Shapiro to Neo Nazi videos.

The conversation on the #LWTSummit stage was, at its heart, about how trolls are targeting children via videos both of and for children, and why YouTube has responded in the way that it has. Clearly, protecting children from being exposed to … well, what ever their parents don’t want them exposed to … is very important. I don’t have kids, and this piece isn’t about me morally policing what parents choose to teach their children, so I’m not going to presume any specific topics here. I think we can all agree that neither ad-revenue-based social media companies nor internet trolls should be choosing what kids see online.

All you tech companies built cities, beautiful cities, but decided not to put in police, fire, garbage, street signs… and so it feels like the Purge every night.

Kara Swisher

Troll farms aren’t new. This has been going on for years, and receiving coverage even in the halls of Congress. All the platforms have begun working on solving this; here’s an announcement from Twitter, for example.

What’s new right now is the target: troll farms began posting inappropriate comments on videos of young children. Even a completely innocuous family video of a child can, when cast in an inappropriate light (eg. by insinuating sexualization in the comments) become a violation of the platform’s Terms of Service, objectionable to advertisers, and of course, very objectionable to the parents. The result: advertisers pulled out of YouTube. Money talks, so YouTube’s CEO had to respond.

Back to the stage, the conversation turned to the big question: what is YouTube gonna do about it? And why was deleting and disabling comments the best answer that YouTube (and by proxy, Google) could come up with?

This is where we didn’t really get a straight answer. Wojcicki deflected with semi-technical answers, and what became clear is that YouTube is fundamentally unable to employ humans to directly moderate the volume of video traffic that’s uploaded. She stated that 500 hours of video footage is uploaded every minute to YouTube. Every minute! To handle this scale, the only viable approach is to train machines to augment and amplify the reach of humans.

(Incidentally, this is why I prefer the term augmented intelligence rather than artificial intelligence.)

Machine Learning in 2019

So here we are. It’s 2019 and Machine Learning must be used to enable a small team of humans to moderate a very large amount of video content on YouTube. For those not deep in the AI/ML field, this is done through a technique called “supervised learning.” It goes something like this:

  • A team of moderators review videos and comments
  • They apply attribute tags based on their opinion (eg., what’s hate speech, what’s a cat video)
  • This data set (videos & comments & attribute tags) is called a “training set”
  • Machine learning is applied to the training set, creating a “trained model”.
  • The model’s primary characteristic is that it can view videos or comments, which were not in the training set, and predict what attributes the human moderators would have applied, if they had seen it
  • And, the model can be deployed at “cloud scale” to moderate all the videos and comments on a platform – even right at the moment you upload a new video.

That’s good, right? Humans teach machines how to spot abuse, ToS violations, spam, etc; the machines moderate the interwebs; there is less hate online; children are safer; and, naturally, YouTube keeps earning ad dollars and also saves money by paying a hundred employees to train machines instead of tens of thousands to do the moderation by hand. Business as usual continues. At least in the eyes of Capitalism, this is good…. but it doesn’t yet explain why they had to shut down comments so broadly.

A side effect of Machine Learning is that human bias is amplified. Imagine what would happen if everyone on that moderation team were a white man. Now imagine if they were all black women. Clearly, the resulting moderation would be … different. I’m not going to speculate on exactly how it would be different, but I hope anyone reading this can follow the analogy and see that there will always be differences in ranking and rating of content based on differences in the lived experiences of humans.

Technology doesn’t solve this bias – it only amplifies the bias because it amplifies the reach of the humans who built and trained the ML model. Building a training set that is diverse, inclusive, and not biased is … difficult, if not impossible. We, as an industry, are struggling with this topic right now; it was a recurrent topic at AI/Next Con 2018 and 2019, and also addressed directly at the ML4ALL 2018 conference. As far as I know, no one has a good answer yet.

Another side effect is that these algorithms have weaknesses. If you understand how ML works and have insight into the specific model or training set, you can often create discrete signals which trick the algorithms into classifying your content incorrectly. In other words, through an adversarial approach you can often fool the network and do things like trap autonomous vehicles, avoid facial detection, inflate the rank of books on Amazon, or potentially hack corporate chat bots. (The last link goes to the abstract for a talk that was just delivered on March 2nd. I’ll update this if I find a recording.)

Unfortunately, the discussion between Swisher and Wojcicki didn’t reach a great conclusion. YouTube needs to keep trolls from harming kids, so it needs to moderate content, and the only way to do that is with ML … but ML has weaknesses which are inherent in the technology, so the only solution is to be over-protective and, yes, some good comments and normal videos will get censored but no children will be harmed.

These two problems – the need to use Machine Learning to moderate the internet, and its weakness to training bias and adversaries gaming the algorithm – affect the whole industry. Amazon, Facebook, etc., need to moderate their platforms as well, and their ML models aren’t immune to these problems.

So Machine Learning, the solution to yesterday’s problem of scale, has created today’s problem. Trolls target a platform, game the ranking and moderation algorithms, harass children and minorities, and the best answer we have to this? Mass censorship. Disable comments and delete content that might be objectionable with an over-reaching definition because it’s better to be safe than sorry.

Let’s back up

Let’s back up a few months, because censorship is already alive and well on the Internet. It’s just that most people haven’t taken notice.

There was a brief stirring a few months ago when Tumblr decided to ban sensitive content on their platform. This happened on December 17th which, coincidentally, is recognized as the International Day to End Violence Against 53X Workers, despite an outcry from that community about the impact this would have on their businesses. (No, that’s not a typo, I’m intentionally avoiding certain words in the hopes this improves page rank.)

Who remembers the Craigslist Personals section? It was taken down a few months before, after being a part of the platform since the beginning of Craigslist.

All over the internet, our network and social platforms – previously immune to litigation for media served through them – have quietly been deploying censorship tools to protect themselves from two new laws (known as SESTA and FOSTA) which stripped a core internet protection, known as Section 230. This law had protected our ISPs and social networks for 20 years from any legal liability for content which users uploaded; the user was, of course, still held responsible if they broke the law. At the beginning of 2019, the liability shifted from users who break child endangerment or traff cking laws to the executives who operate the platform on which the content is shared.

We restrict the display of nudity or sexual activity because some people in our community may be sensitive to this type of content. Additionally, we default to removing sexual imagery to prevent the sharing of non-consensual or underage content.

Facebook Community Standards (2019)

In their gross overreach, these new laws make it a very serious crime for any platform (eg. Tumblr, Facebook) to participate in any way, even without its knowledge, in enabling the harming of minors via traff cking.

However, some things are fundamentally broken about this:

  • It is socially impossible to determine whether a picture of a person, posing in their underwear, posted online, is voluntary or not.
  • It is technically impossible to distinguish between: sand dunes and n00ds; male, female, or non-binary nupples; art and pr0n.

The result? Just like YouTube laying the ban-hammer on comments, several internet platforms had already banned anything 53Xual in nature. Even Facebook amended its Community Standards to exclude 53xuality-related pictures and speech, and I have heard anecdotal reports this is being applied to comments and discussions in private groups as well.

No one disputes the importance of protecting children from exposure or exploitation online.

Protect The Children

Two days after watching that discussion at the Lesbians Who Tech Summit, I had the chance to have an informal chat with some folks from the Electronic Frontier Foundation in person. It was difficult not to just fangirl at them, and I am grateful for their time chatting with me. I drew a shocking conclusion from the conversation.

Back in 2018, SESTA/FOSTA was pushed through Congress with almost no objection and little review period because of one strategy: congresspersons would not go on record opposing a bill that protected children. Only 2 representatives voted against it. Facebook’s changed Community Standards clearly says that it “defaults to removing sexual imagery to prevent the sharing of non-consensual or underage content.”

Now perhaps you see the connection? There’s a social lever which is available to any political party, either within our country or outside of it. Endanger children and, rightly, you can mobilize an unstoppable public force.

We must keep the internet free not only from ISPs but also from what the impact is of algorithms to decide what goes forward.

Nansi Pelosi

Censorship of comments and videos on YouTube is being necessitated to “protect the children”, and I suspect we are about to see new bills introduced which legislate this moderation wholesale across the internet. This tactic can be used against any online platform, and if Google can’t muster a response better than disabling all comments on any video related to children, the rest of us have no hope.

It’s Just Politics

Wednesday morning, Speaker of the House Nansi Pelosi introduced a new Net Neutrality bill. At the end of her speech, she pointed out that we must keep the Internet free from the impact of algorithms. Nowhere in the text of the bill is the word “algorithms” mentioned, and I keyed in on it during her speech. This struck a chord and was the moment I decided that I needed to write this article.

Why would anyone care enough about kids videos to create a distributed effort to post millions of fake, harassing comments on them? This isn’t coming from normal users, it’s coming from bots – automated trolls doing the bidding of someone to achieve something.

Our social media platforms already use algorithms – machine learning – to determine what posts we see; someone out there is using algorithms to target children online; companies are responding by trying to build better algorithms to combat the bots, but they’re failing, and Congress is taking notice.

I am concerned that the outcome of this techno-political battle will be another legislative bill necessitating the protection of children on the internet from hateful, abusive, or exploitative speech. As someone who works in this field, I know that any technical implementation we create will be chilling. Machine Learning is a new tool and it is still still not well understood – even within the Tech Sector. Allowing our freedom of speech to be controlled by a technology which only amplifies the bias of its creators will not be good for our Democracy. Nansi Pelosi was echoing the sentiments of Sir Tim Berners-Lee when she said that we must keep the internet free from algorithms, and I agree.

I’ll end on this note: who do you think would benefit from limiting free speech online?

What happened to OpenStack?

I get asked this question fairly often during interviews, like people think that OpenStack is dead. I make a squishy face before answering. Here’s a hint: OpenStack is not dead, but something happened to it.

OpenStack logo – 2019

The project is still quite healthy and is following the usual hype curve: OpenStack is finally mature enough that it’s less interesting to talk about, so it’s understandable that people think something happened.

The folks asking me this question are generally curios and good intentioned, and I think my answer says a lot about my professional background, so even during an interview I indulge in a short ramble.

I know, I know … I should have written about it at the time, but I had other things going on and wasn’t making space in my life to write. So I’ve decided to revisit the topic today. Next time someone asks me, I’ll just point them here 😉

We all knew that, for the long-term success of the OpenStack ecosystem, we needed a hyper-scale OpenStack public cloud to succeed, and a healthy set of SaaS applications to be built upon the IaaS/PaaS layers that we had. We knew there needed to be at more than one hyperscale implementation to avoid a gravity problem (in the early days, that was RackSpace).

Publicly, lots of folks were talking about this; common headlines compared OpenStack to AWS, and every VC and analyst wanted to know if it was ready for the Enterprise yet. I even sat on a panel to discuss this very question! (There’s a recording of this, but I won’t link it here because I don’t like being reminded of how I used to look.) But over time, this key component of our success never materialized. Why was that?

Back in 2014 I was working at HP Cloud. Money was pouring into OpenStack and we were all riding high on that. This was the year that OpenStack hype peaked, and I will never forget the Carnival at the Paris Summit … tre magnifique! From outside, however, it may not have been obvious that this was coming from Telcos and Hardware Vendors seeking to influence the direction of OpenStack. Specifically, these companies wanted to create the technological basis for integrated private cloud solutions they could take to market quickly.

Privately, we could see that these market forces were opposed to actually achieving hyperscale. The Big Tent was a bold move to try and address some of these issues. We (that is, the Technical Committee) did what we could, but in the end we couldn’t fix the problem because we didn’t control the purse strings.

You see, almost every company contributing substantially to OpenStack at that time relied on Enterprise customers. That’s a nice way of saying well-known brands who spend a lot of money on each other every year. Unlike the hyper-scalers, Enterprise customers generally do not have an interest in building their own servers or writing their own switch firmware. They buy these things (and the support contracts for them) from other Enterprise companies that make them, and those companies want to keep making money from these deals.

Meanwhile, distros and operators were struggling to make OpenStack installable so that they could roll out, and maintain, deployments to satisfy the market demand. And the Enterprise demand for new private cloud regions was incredibly high. A lot of internal pressure was being put on the OpenStack developer teams (at the Distros, Telco’s, HW vendors) to create software that worked well at a scale these Enterprise Customers needed, roughly in the 1,000’s to 10,000’s of cores with a small number of regions for geographic resiliency.

However, building a distributed system for a scale of 10,000 cores is fundamentally different than building for 10,000,000 cores. Besides just changing the code, scaling up by another three orders of magnitude requires fundamental changes in how any business operate. The margins would need to become much, much smaller. Gone would be the 3rd party support contracts, the B2B deals, the value-added software – the grease in the sales’ teams wheels. The enterprise hardware & software giants were not willing to restructure their businesses that much, and the organic growth of companies like OVH was too slow (even though I think they are still on the right track).

Creating a viable, open source, hyperscale cloud software solution was against the best interest of the companies most heavily investing in OpenStack’s development.

In my usual fashion, I tried to draw this on a napkin one day. The original is long since lost, so I’ve redrawn it from memory below

Influence was flowing from the wrong direction. Product management within the Telco, Hardware, and Distro space — that is, companies like ATT, Cisco, Dell, and RedHat — were pouring in money to pay developers to have influence and write code, funding lavish parties and inflating salaries (in full disclosure, I certainly enjoyed the parties, and while I got paid well, I didn’t make out nearly as well as a lot of folks did). Meanwhile, Operators were getting further and further behind the release-train, had little voice (they were busy operating the clouds, after all) … and the end-users of the cloud (app developers) had almost no voice at all.

Eventually, revenues would need to come from the growth of supported businesses on top of the cloud (ie, app developers’ success), but we couldn’t get there because operators weren’t able to scale up enough, or maintain cross-cloud compatibility well enough, for a healthy market to flourish on top of the cloud. Everyone on the left-hand side of that diagram was too busy trying to differentiate to win deals against each other.

With this going on, the upstream technical team leads — who could see the problem — were lobbying inside our respective companies to change the priorities. We convinced some managers and executives to get onboard with tactical investments in code and infrastructure, but we couldn’t get enough backing for the larger changes that were needed.

As long as the financial influencers were focused on the business models of mid-sized customers (MSP’s and Enterprise markets), they were not willing to invest in the massive strategic efforts needed to make OpenStack competitive in the hyperscale market. The vast majority of developers, whose project goals were dictated by PMs and executives within their employers, were thus busy driving features (designed to take advantage of “value-added integrations”, the grease of the Enterprise sales cycle) into the codebase, and no one could fix the core issues that had become apparent to all the deployers who were busily trying to scale up their clouds.

You see, the successful hyperscalers had already removed middleware marginal cost. Google, AWS, even Facebook, all make their own servers, switches, storage, etc. They fork open source projects then pay developers to maintain and improve private versions so that they can stay ahead (some projects are now trying to prevent this). These companies pioneered scale by shaving the costs off their infrastructure, but those efficiencies have not made it back to the Enterprise market. They became the behemoths they are, in no small part, by building processes, teams, and hardware to break any reliance on 3rd parties for software, hardware, or support.

So, you see, creating a viable, open source, hyperscale cloud software solution was against the best interest of the companies most heavily investing in OpenStack’s development.

When you’re looking at other cloud products, think about similar conflicts of interest that might be affecting your favorite spokespersons today… (I’m looking at you, kubernetes)

Those of us who joined OpenStack early and dreamed big, well, we had no chance of building those dreams within OpenStack. It became something less hype-worthy but perhaps even more useful: an extensible tool for mid-scale infrastructure automation, powering thousands of businesses around the world, including many non-profit and public-good institutions.

And – just to be clear – I’m very glad to have been able to help in my small way to build a tool that has been used so widely for good.

I am publishing this now in the hope that it can serve as a warning to everyone out there who is investing in Kubernetes. You’ll never be able to run it as effectively, at the same scale, as GKE does – unless you also invest in a holistic change to your organization.

Even then, if you’re using Kubernetes, you probably won’t succeed, because it isn’t in Google’s best interest to let anyone else actually compete with GKE.

And that’s probably OK. In fact, OpenStack + Kubernetes is probably just fine for what ever you’re building, and neither project is going anywhere any time soon.

If another company wants to stand a chance in the hyperscale space, it will need to look holistically at its entire business apparatus, and either repeat the Google/Facebook/Amazon model of privatizing open source and investing in your own server fabrication or band together with other vertically-focused, open source businesses in order to compete.

And if you’re doing that, please drop me a line. I’d love to help.