This page shows all the available error categories and subtypes used to generate realistic but incorrect SQL queries.
The error taxonomy shown here was created specifically for this project based on research into common SQL mistakes and real-world error patterns. This comprehensive hierarchy of error types helps ensure the generated incorrect queries are diverse and representative of genuine mistakes that humans make when writing SQL.
The taxonomy was developed by analyzing:
When using the OpenAI GPT-4o model to generate incorrect queries, this taxonomy guides the model to produce errors that closely mimic real human mistakes rather than random alterations.
Description: Errors in table joins
Description: Errors in WHERE clause conditions
Description: Errors in aggregate functions and grouping
Description: Errors in column selection
Description: Errors in result ordering
Description: Errors in subqueries
Description: Errors in limiting results
Description: Errors in handling NULL values
Description: Errors in SQL syntax (but still executable)
Description: Errors in query meaning