
Content Moderation for New Products
This is a follow up to “Content Moderation: Where We Are Now.“
If your product has user generated content (UGC), moderation is an essential burden. This is likely the last thing on your mind. You and your team have a vision of some great new thing. Dealing with the dark side here is a time suck distraction and cost. But deal with it you must. Products small to large haven’t fully solved this. The large scale social networks are the lightning rods, but all services contend with this. And you take pain from at least two sides; managing moderation itself, and the criticism from those who think you’re doing it wrong. Maybe too little. Maybe too much. Most likely just not “their” way. You need to have communications management in place to deal with this as well.
Spoiler Alert: At the end of this writeup is a link to a sample of how code classification for moderation can work.
Goals First
Why are you using user generated content? Is it core to your product? Or enhancement? Do you need to care much about free speech? You don’t have to. That applies to the U.S. Government; not companies. You can moderate all you want. The question is, within your context, to what level? If offering kids chat, parents will be happy for you to 100% identify posters and expect 0% bad things. If you’re running a travel forum, you may think you’re going to get nice content, but you may get issues with critical reviews. Maybe disparaging and lawsuit provoking; but true. How will you handle these? What are your goals? You should have these sorted out first so you can align your policy with goals.
Policy
While the First Amendment doesn’t bind companies, users expect a level of free expression. Strive to establish clear policies balancing expectations with the needs and values of your platform. You can find checklists to start modeling this. But think carefully about your situation to customize. (We’re not going to attempt an exhaustive list in this high-level checklist.)
Risk Assessment
Create a risk assessment. Many line items should be obvious; illegal speech showing up, various forms of inappropriate behavior and so on. But is there anything specific to your topic area or special to your industry?
Don’t forget about the changing attacker or bad actor landscape and how much wider your attack surface area may be. Relatively low-skilled attackers, spammers, etc. have access to sophisticated tools. It’s unfortunate you must face all of this. Maybe you have a nice little niche focused web site where you’re selling custom potholders from around the world all made by fair-trade craftspeople. You’re a beautiful company and CEO or Product Person. And yet, you may have to deal with these issues. Can you skip risk assessment and not waste the time and money because you think you’re not a juicy enough target? Sure. Just consider; the difference between a relatively quick response based on preparation vs. blowing in the wind while you sort things out could be the difference between a painful week and a company-killing week. Let’s speak plainly. Costs suck. Fraud mitigation sucks. Criminals suck. It may be that most people are good. But online the handful of millions that aren’t are just a click away. If they target you, there’s no place to run from the fight. You’re probably fine. The probability you’re the one that’s going to be headline news is low. But the magnitude of the risk with just the wrong kind of lawsuit may be high.
Also consider internal risks. That is, whether your own staff or contract staff is doing moderation, you could be named in any lawsuits. Whatever indemnity agreements were signed with a contractor, you’re still likely to be served. Add that exposure to your list.
Part of your risk includes your entire company being censored or blocked as well. In our current landscape, the issues you face are not just about how you manage your customer policies and interactions, it’s how your whole company might be treated.
We’ve seen instances where businesses have been banned from various services. This can include your ISP, your banking relationships, distribution in App stores, and more. Whatever you personally might think of others’ services, this is arguably a dangerous thing. Parler was “deplatformed” in 2021 and had to find new digs. Maybe your perspective happens to lean left and you think that’s a good thing. OK, then what about the WikiLeaks payment blockade? What about TikTok or OnlyFans? And while not a U.S. issue, we’ve seen China’s blockade of Google, Facebook and Twitter. How about PayPal denying service to Palestinians. There’s more examples. Where do you fall in a risk profile here? Maybe not at all. But if any aspect of your service might fall under what anyone else might consider controversial, you have to consider the risk of being banned from some critical services yourself. So consider alternatives.
Filters
Empower users with tools to filter out content from non-identified posters or low-reputation contributors. Features like Reddit Karma-style scores can encourage quality participation, but be vigilant about trolls or manipulation that can undermine these systems.
User Identity
Establish mechanisms to identify users while respecting privacy. One day in the not-too-distant future, maybe blockchain-based identities or other verifiable reputation systems can provide trust and accountability without compromising anonymity or pseudo-anonymity for those who desire or need it. But for now, you’re faced with more traditional methods that can run from simple email verification, moving up to phone number and then perhaps to full identity document verification and such. What level is clearly product dependent, but each has varying cost levels. Anonymous and pseudonymous speech has value; from dissidents to those with personal medical issues they’d like to discuss and so on. However, if you’re going to provide these levels of user cloaking, you’ll still need to sort out how to maintain service policy compliance.
Community Review
Volunteer moderators can help scale moderation efforts, but they come with challenges. Ensure clear guidelines and oversight to prevent bias or abuse, as even well-intentioned moderators can make decisions that harm the community. Always ask: who moderates the moderators?
Your Tool Options
You’ve got the typical two issues. Quality and Costs. You’ll have to estimate what level of quality you need and align that with your business model. If you’re thinking, “well, I’d like 100% quality,” whatever that means to you, it could imply there’s little way you can do anything realtime and you’re going to have high costs. If you’re ok with something you’d disallow being up for a bit of time, (whether that’s an hour, a day, or more), that’s something else.
Here’s what you’ve got:
Technology-Driven Solutions
- AI Tools: Use artificial intelligence to detect and flag harmful content, such as hate speech or misinformation. AI can process large volumes of data faster than human moderators. But you’re going to need to acquire foundational models or train yourself within your category. Or… you don’t necessarily need the latest LLMs for this. If you have smaller volumes or narrow topic areas, more traditional methods might work just fine. These include rule-based spam filters, logistic regression, and other types of classifiers.
- Keyword Filters: Implement word filters to automatically block or flag problematic language while allowing for context-sensitive reviews. These days, this is likely way too simplistic.
- API Based Tools: This likely overlaps with the below option for use of external services. Such a solution can range from simple language checks to AI tools; more likely leaning on AI tools.
- Multimodal Considerations: You may need multiple tools if you’re allowing for various content types; imagery, video, audio, or others.
- Multilingual Considerations: If by policy you keep to one language, you’ll still need to detect other languages, (outside of common idiomatic use), or deal with multiple languages if that’s allowable.
- User & Content Management Tools: Regardless of detection method, (or use of the options below), you’ll need a means to allow for blocking, unblocking, banning, archiving, auditing, etc.
Human Moderation
- Professional Moderators: Employ trained moderators to review flagged content, especially in nuanced or borderline cases where AI might fail.
- Community Moderators: Empower trusted community members to moderate content within defined guidelines. (Make very sure policy is clear here; moderation is not editing. Moving categories, suspending content is moderating. Don’t cross the line into publishing. That wholly changes your legal posture.)
- Hybrid Approaches: Combine AI and human moderation for improved accuracy and efficiency. This might be called a “Human in the Loop” approach. And the results can be fed back into AIs so they can improve. You can also consider pre-screening, but anything that’s not realtime would likely be considered weak on the part of a user.
User / Community Tools and Empowerment
This is essentially Crowdsourced Moderation for the most part. Whatever the mechanisms used, you’re relying on community action to make judgements. At what levels a trigger may fire to suspend or remove content is a choice.
- Mute/Blocking and Reporting Features: Allow users to block or report harmful content or accounts easily.
- Progressive Access: Let users read first, comment next, initiate content later. They can build up credits or some other type of score values. This is an attempt to socialize users and maybe limit bad actions. Of course, some bad users will just power through this period in any case.
- Content Rating Systems: Enable users to self-categorize or rate content (e.g., “mature” or “safe”). Or just simple liks.
- Custom Filters: Let users customize their experience by setting personal filters for certain types of content.
- Volunteer Moderators: Recruit trusted, active users (superfans or long-time members) as unpaid moderators with limited administrative privileges. This one requires some care. You’ll need to make sure you understand the motivations for participation here and make sure mods aren’t making some communities worse off. This can happen.
Accountability Mechanisms
These don’t necessarily solve all problems, but can mitigate issues. And in some cases, unfortunately, at least allow for removal of a problematic user. Use of any of these mechanisms, however, just offers you a means to manage the individuals. You still have to deal with the content itself.
- Verified Accounts: Require verification for accounts engaging in public discussions to reduce anonymity abuse. There are a variety of identity verification services, though there will be some cost here. You can still use screen names or other mechanisms for anonymous or pseudonymous participation. No one needs to publicly reveal personally identifiable information. But you can still have some real control over a truly rouge account if necessary.
- Paywalls: Whether verified or not, use of paywalls can be motivational in enforcing good behavior. While paywalls will usually imply some degree of identification, (e.g, use of credit card), this is not necessarily the case. Perhaps you allow payment in crypto which would still be anonymous, allowing only blocking by the crypto address.
- Reputation Systems: Introduce systems where user trust and behavior are tracked and impact privileges (e.g., comment visibility or account restrictions). However, this also implies some degree of identification. If only doing this via an email account or something, there’s nothing to prevent a user coming back by other means.(Unless you’re also trying to fingerprint the user with other technologies.)
- Gradual Penalties: Use escalating consequences for violations (warnings, temporary bans, permanent bans).
Scaling
Congrats! You’re growing. The Product Manager and perhaps an assistant can no longer handle this on their own. Or maybe you had a community moderator already. But now? It’s time to hire. The options here are typically as follows:
- In House External Contractor(s): You might want or need more than one firm.
- External Review Board: Do you need one?
Crisis Management Plan
Regarding a crisis management plan, step one is to assume that you’ll have one. Ideally you won’t need it. Hopefully every bit of time and money you might spend in consideration of crisis management will be a total waste; kind of the same as you have car and home fire insurance that you hope never to use.
Be at least partially ready anyway. Have crisis management plans for when things go wrong. And they’ll maybe go wrong on your local overnight hours during the weekend when it’s harder to respond. Information will be picked up and spread faster than you can possibly respond. The old expression about a lie moving around the world faster than the truth can get its boots on is quaint since it’s all become so much worse than that.
Have legal counsel available. Even if not in house or on retainer, you should already have the number ready to call. And you should have your mechanisms to respond ready to go for when you’ve got something to say.
Here’s a first tip from firefighting/EMS… what’s the first treatment for a burn patient? Some say treating for shock or pain or whatever. But the answer is really, “Step One: Stop the burning.”
Watch the Law
This is another pure Governance, Risk, Compliance cost to you. You need to monitor changes in law that might impact user generated content in general or possibly something about your specialty topic area in particular. Things do change and have been changing. Even though privacy laws might seem to be a separate topic, you might also need to deal with content takedown issues if identity is involved.
Make sure your public policy documents are compliant with any regulations, and that any internal or contractor moderation rule sets are aligned with such policies. You should also archive any actions taken should you face challenges later on.
Final Thoughts
Content moderation is a necessary investment for any product or platform that involves user-generated content. While it might feel like a distraction from your core vision, failing to address it proactively can lead to significant risks, both operational and reputational. Moderation is not just about protecting your users; it’s about safeguarding your brand, your platform’s integrity, and your long-term viability.
To succeed, start with a clear understanding of your goals and the specific risks your platform may face. Tailor your moderation policies and tools to align with your audience, industry, and business model. Scalability might end up being an issue. But it’s likely best to begin with simple, cost-effective solutions, and evolve as your platform grows. Whether leveraging AI, employing human moderators, or empowering your community, you’ll face the usual issues balancing quality, cost, and efficiency.
Remember, content moderation isn’t a one-and-done task. It requires ongoing attention, updates, and responsiveness to both user behavior and external threats. Crisis management and legal compliance are equally essential, as the consequences of neglect can extend far beyond the immediate issues.
Ultimately, thoughtful moderation builds trust, fosters community, and protects your product from unnecessary harm. By planning ahead and investing in the right mix of tools, processes, and policies, you’re not just mitigating risks, you’re laying the foundation for sustainable growth and success in an area that’s facing a lot of attention and change right now.
Extra Credit
Would you like to see up close how some AI tools can evaluate content?
The amazing thing about the tools we have available today include the ability to quickly – and essentially for free – investigate some of these technologies. There are free datasets of social content available, and evaluation models you can apply to them to get a sense as to how they work. Technical coding skills are useful here, but not wholly necessary. If you want to try out such tools yourself, we have services such as Google Colab, (which is a notebook style means of running code), and HuggingFace, which can provide both foundational text models and test datasets. I’ve built an annotated Google Colab notebook with a sample. The example does not require a Hugging Face account, but I’ve put in some instructions to do so if you choose to test out other datasets. To run this Colab notebook code yourself, you can make a copy into your own Google Drive. (Disclaimer: I’m not a professional coder. By all means, if you are and you see problems in the code, please let me know. But in no way should anything I offer here be considered production quality.)
Here’s the test code exercise: ContentModerationTest.ipynb