Error Taxonomy

This page shows all the available error categories and subtypes used to generate realistic but incorrect SQL queries.

About the Error Taxonomy

The error taxonomy shown here was created specifically for this project based on research into common SQL mistakes and real-world error patterns. This comprehensive hierarchy of error types helps ensure the generated incorrect queries are diverse and representative of genuine mistakes that humans make when writing SQL.

The taxonomy was developed by analyzing:

  • Research papers on SQL error patterns
  • Common mistakes observed in SQL learning environments
  • Expert knowledge about semantic vs. syntactic SQL errors

When using the OpenAI GPT-4o model to generate incorrect queries, this taxonomy guides the model to produce errors that closely mimic real human mistakes rather than random alterations.

Description: Errors in table joins

Subtypes:
subtype Incorrect Join Columns
subtype Missing Join
subtype Wrong Join Type
subtype Unnecessary Join

Description: Errors in WHERE clause conditions

Subtypes:
subtype Incorrect Comparison Operator
subtype Missing Condition
subtype Wrong Logical Operator
subtype Comparing Wrong Columns

Description: Errors in aggregate functions and grouping

Subtypes:
subtype Wrong Aggregate Function
subtype Missing Group By
subtype Incorrect Having Clause
subtype Aggregating Wrong Column

Description: Errors in column selection

Subtypes:
subtype Selecting Wrong Columns
subtype Missing Important Column
subtype Redundant Columns
subtype Incorrect Column Alias

Description: Errors in result ordering

Subtypes:
subtype Wrong Order By Column
subtype Incorrect Sort Direction
subtype Missing Order By
subtype Unnecessary Ordering

Description: Errors in subqueries

Subtypes:
subtype Unnecessary Subquery
subtype Missing Necessary Subquery
subtype Incorrect Correlation
subtype Subquery Wrong Columns

Description: Errors in limiting results

Subtypes:
subtype Missing Limit
subtype Incorrect Limit Value
subtype Missing Offset
subtype Incorrect Offset

Description: Errors in handling NULL values

Subtypes:
subtype Incorrect Null Comparison
subtype Missing Null Check
subtype Unnecessary Null Check

Description: Errors in SQL syntax (but still executable)

Subtypes:
subtype Incorrect Alias
subtype Keyword Misuse
subtype Missing Parentheses

Description: Errors in query meaning

Subtypes:
subtype Solving Wrong Problem
subtype Misunderstanding Question
subtype Incorrect Approach