Star History Weekly

#14 🌎 Imagine having ChatGPT write correct and useful SQL queries for you!

Hi and welcome to Star History Weekly #14!

If you are new, it's Mila here. Each week, we curate some open-source news and take you to explore an open-source project, the Starlets. Meanwhile, we share fascinating GitHub repos daily over at @StarHistoryHQ, so make sure to follow us if you haven't already.

If you like this newsletter, we ask you to subscribe and share!

In this Issue #14, we compiled a collection of open-source Text2SQL tools to try out.

Text2SQL, or Chat2SQL tools convert natural language or questions into SQL queries. Imagine having ChatGPT write beautiful, correct, and useful SQL queries for you!

These tools started to bridge the gap between non-tech users and databases, by allowing them to interact with databases using natural language and reduce the barrier to accessing and analyzing data. But with the advance of AI models, these tools now support more advanced features such as handling complex queries, joining multiple tables, or even supporting natural language conversations.

They can also help improve productivity by automating the process of generating SQL queries, thereby saving time and effort.

In this edition of Star History monthly, we have compiled a collection of open-source Text2SQL tools.

Chat2DB

Chat2DB aims to be a general-purpose SQL client and reporting tool that incorporates AI capabilities from the start. It supports connection to a handful of databases including MySQL, Postgres, Oracle, SQL Server, SQLite, ClickHouse, and more.

There was a bit of drama involving Chat2DB a while ago, we won't get into details here but I'm curious to know what you think.

SQL Chat

SQL Chat is a chat-based SQL client, and you can use natural language to communicate with your database to implement operations, such as query, modification, addition, and deletion (!) of the database.

It currently supports MySQL, Postgres, SQL Server, and TiDB serverless.

It's open-sourced by Bytebase, a database migration tool for teams.

Vanna

Vanna is a Python framework that allows the training of an RAG model with queries, DDL, and documentation from a database.

You can use Vanna as is, or build your custom UI with an existing tool (e.g. Streamlit, Slack).

It was open-sourced in July 2023 and got popular this January.

DuckDB-NSQL

DuckDB-NSQL is a Text2SQL LLM built for local DuckDB SQL analytics tasks, by MontherDuck and Numbers Station. This can certainly help users leverage the full power of DuckDB and its analytic potential, without having to go back and forth between the DuckDB documentation and the SQL shell.

Langchain

With Langchain, you can build a Q&A chain and agent over an SQL database yourself.

LangChain also has an SQL Agent that you can add to the chain. It can not only answer questions based on the databases’ schema and content but also recover from errors by running a generated query, catching the traceback, and regenerating it correctly.

Awesome Text2SQL

Awesome Text2SQL is a suite of curated tutorials and resources for LLMs, Text2SQL, Text2DSL, Text2API, Text2Vis, and more. Most of the models are LLM+Text2SQL, and for each model, there are links for papers, code, and datasets. If you want to dive deep into Text2SQL, take a look.

To Wrap up

LLM or not, you should still be extra careful when executing model-generated SQL queries. Some ways to minimize risks include describing your database schema, and data; constraining the size of the output; and validating and reviewing the generated SQL queries before executing them.

Lastly

If you want more AI, check out earlier editions of the Star History open-source monthly: