Open-source AI data analyst - tutorial to set one up in ~45 minutes

I’m one of the builders behind this, happy to answer questions or discuss better ways to approach this.

There's a lot of hype around AI data analysts right now and honestly most of it is vague. We wanted to make something concrete, a tutorial that walks you through building one yourself using open-source tools. At least this way you can test something out without too much commitment.

The way it works is that you run a few terminal commands that automatically imports your database schema and creates local yaml files that represent your tables, then analyzes your actual data and generates column descriptions, tags, quality checks, etc - basically a context layer that the AI can read before it writes any SQL.

You connect it to your coding agent via Bruin MCP and write an AGENTS.md with your domain-specific context like business terms, data caveats, query guidelines (similar to an onboarding doc for new hires).

It's definitely not magic and it won't revolutionize your existing workflows since data scientists already know how to do the more complex analysis, but there's always the boring part of just getting started and doing the initial analysis. We aimed to give you a guide to just start very quickly and just test it.

I'm always happy to hear how you enrich your context layer, what kind of information you add.

submitted by /u/PolicyDecent
[link] [comments]

Open-source AI data analyst - tutorial to set one up in ~45 minutes

Want to read more?

Tagged with