Let me define the problem statement first:
Users/managers constantly ask for slightly customized metrics or weird date ranges that standard dashboards don’t cover. Hardcoding a new UI chart for every edge-case query is unscalable.
Sounds simple enough problem, right ?
We could just make some chatbot API.
Well it is simple if you ignore the security risk that it could be, plus how could a chatbot possibly provide visualizations?
I was asking these questions to myself as well.
So let’s start from the very beginning and walk our way through it.
Choosing a LLM
As this was experimental project, I wanted something that is cheap and dependable if the project works.
I could’ve gone with something like “Open Router” which can provide multiple models and I could switch between them.
But I choose Google, specifically because of their open gemma4 models.
Those models weren’t only cheap but also very reliable with my testing.
Not only that but Gemini can provide JSON formatted Data in the format I want which is a game changer but competitors like openai can also do the same.
Sanitizing the data
For this I wrote a strict prompt which should block most prompt injection, I also wrote some regex which blocks harmful prompts such as DROP, INSERT and such.
But regex alone isn’t very reliable.
I made a new user which only has read access to the tables that are useful to LLM, so even if bad actor manages to skip every check before, they still won’t be able to run anything malicious.
Storing the Data
So we’ve integrated chatbot, and we’ve sanitized the data. Now how do we store it ?
I designed a structure which is very extensible.
First a Chat Room table to store History of all the chats
CREATE TABLE chat_bot.chatroom (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chat_title TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
is_share BOOLEAN NOT NULL DEFAULT FALSE
);
And storing messages itself I did this:
CREATE TABLE chat_bot.chatmessage (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
room_id UUID NOT NULL,
role VARCHAR(10) NOT NULL,
content TEXT NOT NULL,
sql_query TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
which_type TEXT,
is_shared BOOLEAN,
is_saved BOOLEAN DEFAULT FALSE
)
- Here role is either user/bot
- content is the text output from bot or request from user
- sql_query: is query from response from bot
- we’ll later discuss is_saved & which_type
Visualizing the Data
This is everything that client sees, for this I won’t be going into UI but rather how I’m getting information.
which_type from our database discussion gives us type of data to show in client.
say, a table, a line graph, a pie chart, all of which comes directly from gemma.
And the is_saved bit also very interesting,
This flag allows user to place all the things that they gathered from the bot and save it in the dashboard and because I’m saving everything in SQL Query, every chart and tables are up to date.
Future Scope & Conclusion
While gemma-26B is relatively cheap, it still costs real money and isn’t very easy to scale up. So as we gather more data relying on vector database would be much more feasible and would be very cheap. But that’s a far thought for now.
Reach my DMs if you have anything particular to ask!