Recent advances in AI technology have led to the development of highly sophisticated generative models. These tools have the potential to improve efficiency across various industries greatly, but for some, it's important to be able to identify AI-generated content.
The following 10 AI content detectors mostly focus on solving this problem for education facilities and content publishers. Besides that, there are a few tools that are devoted to academic research in the field of NLP or are simply general-purpose AI content detectors.
Following is an overview of currently available tools that serve that purpose.
1. OpenAI AI text classifier
Models covered: A variety of models, including ChatGPT
The OpenAI Text Classifier attempts to distinguish between human-generated and AI tool-generated text and was trained on outputs from 34 models from 5 different organizatoins (and seems therefore to use a different model then detector #5 below).
The AI Classifier is built using Open AI's GPT models based on advanced deep learning algorithms. Some of the features of the OpenAI text classifier include:
- Easily integrates through OpenAI API
- Breaks down content into tokens to mimic the syllables of the English language, using NLP techniques
- Capable of performing text classification for a wide range of applications
- Requires a minimum of 1000 characters as input
- Uses 5 text labels to depict its evaluation results from “Very unlikely” (less than a 10% chance) to “Likely AI-Generated” for over 98% chance.
Accuracy: Around 76%
Models covered: Any AI model
Interesting features: The model can be easily integrated into larger solutions using the BotX no-code AI development platform.
TrueText was built by Ivan Sivak, founder of AI startup BotX.
TrueText helps you identify whether a given text was generated by a large language model (LLM), such as but not only GPT models by OpenAI. TrueText has a 76% success rate at detecting AI-generated text and employs several techniques that help with detection.
BotX has announced that they will be further improving their model features, such as highlighting specific areas of text based on their probabilities of being generated by AI or releasing a chrome plug-in. However, the model already provides great flexibility by the ability to be easily plugged into process workflows built into the BotX No-code AI development platform.
Models covered: Currently up to ChatGPT
Interesting features: Highlights particular text phrases or sentences that it evaluates to have been written by AI.
GPTZero was built by a 22-year-old Princeton university student named Edward Tian and released at the start of January 2023.
The main purpose behind this tool was to help catch plagiarism in academia and flag assignments and papers that are AI-generated.
The tool distinguishes AI-written text by taking input text and analyzing it for two metrics that indicate the text complexity and, ultimately, the probability of it being human-generated:
- Burstiness: It refers to the diffusion of sentences. Uniformly distributed sentences usually indicate AI text. Whereas human text includes a blend of long and short sentences with rhythmic patterns.
- Perplexity: It refers to the randomness of word choice in a sentence. It is measured by processing a sentence to determine if it was randomly combined by an AI tool. A complicated arrangement usually indicates human-written text. At the same time, a more randomly combined sentence points to machine-generated text.
Some of GPTZero's features include:
- Requires at least 250 characters as input
- Offers file batch uploads
- Plagiarism detection
- Gives the average perplexity score and burstiness score in the result
4. Copyleaks AI Detector
Accuracy: Copyleaks claims 99.12% accuracy across multiple languages
Models covered: Copyleaks claims any AI generative model
Interesting features: API integrations and Chrome plug-in.
Copyleaks is a widely used plagiarism detection platform launched by the Copyleaks organization in 2015. Recently it has been amended with AI content detection module.
Some of the other features/attributes of Copyleaks AI Detector include:
- Requires input of a minimum of 150 characters
- Support for multiple languages, including English, Spanish, German, and French
- Customization options for adjusting the sensitivity of the detection algorithms and excluding specific sources from the comparison process.
5. Hugging Face AI Detectors
Hugging Face is an AI community that shares and builds together AI-powered apps. There are currently two different detectors available at that site, one that claims to be created by the OpenAI community and the second by Arsh Kashyap, who goes by the name PirateXX.
OpenAIs Hugging face AI detector
Models covered: GPT-2
AI content detector that detects and categorizes GPT-2 output. This detector uses RoBERTa deep learning model (Robustly Optimized BERT Pre-Training Approach), which is a variant of BERT (Bidirectional Encoder Representations from Transformers).
This detector seems to be mainly built for academic purposes. There are GitHub repositories made available by the creators that an individual can access and work on them to learn more.
The detector has a simplistic UI in comparison with other detectors in the market.
Some of Hugging Face's content detection features include:
- Requires input text between 50-700 words
- Built on top of the Transformers library
Accuracy: 72% on GPT-2
Models covered: GPT-2
Interesting features: Set of features helpful for academic research into generative models.
GLTR (Giant Language Model Test Room) was developed by Hendrik Strobelt, Sebastian Gehrmann, and Alexander Rush from the MIT-IBM Watson AI lab and Harvard NLP. This model was trained using GPT-2; therefore, it cannot detect the content produced by the content generators which have been trained using the latest GPT-3.
GLTR is unique in the sense that it is a visual forensic tool too, which means that it highlights words based on their rank, i.e., how likely the word is to be produced by the GPT-2 model too. It color-codes a word’s rank (the word most likely to be produced by the GPT-2 model) in the following way:
- Green: for the top 10
- Yellow: for the top 100
- Red: for the top 1000
- Purple: For rank > 1000
The also tool gives you three histograms with details about the confidence of the model and how each class of words weighs against the other.
Accuracy: 94% (Claimed by Originality)
Models covered: up to ChatGPT
Interesting features: Allows you to scan entire sites, including subpages for AI-generated content by putting in a URL
Originality is aiming to help content publishers manage and review the quality and originality of the content they produce. This commercial tool is available for $0.01 per 100 words.
Other key features
- Ability to integrate through API
- Chrome plugin that allows you to watch writers as they produce content within your Google Document
Accuracy: Unknown (Not yet released as of February 2023)
Models covered: up to ChatGPT
Interesting features: Highlights particular text phrases or sentences that it evaluates to have been written by AI
Turnitin was created by students of the University of California-Berkeley. Today, it is the most popular tool amongst researchers and publishers, and it catches 'human-committed plagiarism' with good accuracy.
More interesting features:
- Ability to upload documents to check
9. Content At Scale AI Detector
Models covered: up to ChatGPT
Content At Scale AI content detector was created by an entrepreneur, Justin McGill, who had previously worked as a copywriter too. This tool has been made free for use as, for now, its target market is mostly freelancers and individuals using it for their personal use.
The prediction gives percentages to two attributes, ‘Fake’ and ‘Real’ with the Fake attribute associated with the AI-generated content and the Real attribute with the human-generated content.
- Needs at least 25 words to make a judgment about the content and has the limit to process 25000 characters at a time.
- Generates similar text with its text generator
- Compares the input text with the generated text
- Calculates the number of overlapping words, and the percentage of overlapping words is labeled as ‘Fake’, and the rest of the percentage is labeled as ‘Real.’
10. Writer AI Content Detector
Accuracy: Low for detecting human-generated content
Models covered: Unkown
Writer's AI-content detection tool mainly focuses on helping publishers who do publish AI-generated content make adjustments so that not much of their content is marked as AI-generated.