Nirvana: LLM-powered Semantic Data Analytics Programming Framework¶
Nirvana is an LLM-powered semantic data analytics programming framework that enables semantic data analytics queries over multi-modal data (e.g., text, images, audio). It provides a pandas-like interface with semantic operators that use large language models to process data based on natural language instructions. It also allows an optimizer to find the best execution plan for a given query to strick a balance between quality, runtime, and cost. With Nirvana, users focus only on "what they want to do", instead of "how they achieve it".
Step 0: Install nirvana and set up initial llm
Before you get started with enjoying features of Nirvana, the first thing to do is to set up a default llm. Taking gpt-4o as an example,you can authenticate by setting the OPENAI_API_KEY env variable or passing api_key below.
Apply Semantic Operators to DataFrame¶
Suppose that you have only a simple semantic processing task on hand, for which you want to apply semantic operators to the data and obtain results in a few lines of code as soon as possible. You can easily use function wrappers of semantic operators on your data frame. Here is an example.
Extract the genre from the movie overview
Possible Output:
More usages of semantic operators can be found in operators
Enable Query Optimization¶
If you have a complex semantic query over large datasets on hand, you probabily want to process the query in a faster, lower-cost way. In this case, Nirvana enables lazy execution and query optimization to automatically find a plan that scales down runtime and monetary costs. Here is a usage example.
For details and usages of query optimization refers to optimization