What is PandasAI?
PandasAI is an innovative Python platform designed to revolutionize the way you interact with your data. By leveraging the power of large language models (LLMs), PandasAI makes data analysis conversational and accessible to all users, regardless of their technical background. Say goodbye to complex coding and hello to an intuitive interface where you can simply ask questions about your data and receive insightful answers in natural language.
What are the features of PandasAI?
- Conversational Data Analysis: Users can engage with their databases in a natural and intuitive way. By simply asking questions, you can retrieve insights and analytics without needing to write complex queries.
- Multi-format Support: PandasAI seamlessly interacts with various data formats including SQL databases, CSV files, pandas DataFrames, polars, MongoDB, and other NoSQL databases. This flexibility allows users from different backgrounds to utilize the tool.
- Integration of LLMs: The platform supports advanced LLMs like GPT-3.5, GPT-4, Anthropic models, and VertexAI, ensuring that users receive accurate and contextually relevant responses to their inquiries.
- Data Visualization: With PandasAI, you can automatically generate charts and visualizations to better understand your data. Simply ask for a specific chart, and it will be created with just a few commands.
- Multiple DataFrame Handling: Users can query multiple DataFrames simultaneously, allowing for complex data analyses that involve aggregating or comparing datasets side by side.
- Privacy Features: To safeguard sensitive information, PandasAI includes an option to enforce privacy, ensuring that only essential data is analyzed without exposing confidential details.
What are the characteristics of PandasAI?
- User-Friendly Interface: Designed for both technical and non-technical users, PandasAI bridges the gap between complex data analysis and easy access.
- Rapid Deployment: The platform can be deployed effortlessly across various environments, including Jupyter notebooks, Streamlit apps, or even as a REST API through FastAPI or Flask.
- Reliability: With a robust architecture based on Docker, users can count on seamless performance whether on local machines or in the cloud.
- Community Support: A vibrant community exists around PandasAI, offering resources like documentation, example frameworks, and discussion forums for troubleshooting and collaboration.
What are the use cases of PandasAI?
- Business Analytics: Companies can use PandasAI to analyze sales data, customer feedback, and operational metrics by simply asking questions to uncover insights and improve decision-making.
- Academic Research: Researchers can parse large datasets related to their studies and obtain quick summaries, trends, and patterns, significantly speeding up the research process.
- Financial Data Analysis: Financial analysts can leverage the tool to explore revenue streams, expenditure patterns, and market trends, enabling more informed investment decisions.
- Data Reporting: Create informative reports by querying data and visualizing results automatically, streamlining the reporting process without the hassle of data manipulation.
- Machine Learning Preparation: Data scientists can quickly gain insights about datasets and prepare data for further analysis in machine learning tasks, improving overall workflows.
How to use PandasAI?
-
Installation: To get started, clone the repository and navigate to the project directory:
git clone https://github.com/sinaptik-ai/pandas-ai/ cd pandas-ai docker-compose build
-
Running the Platform: Launch the service using:
docker-compose up
Once running, access the client interface at
http://localhost:3000
. -
Basic Usage: Import the PandasAI library, set your API key, and initialize an agent with your data:
import os import pandas as pd from pandasai import Agent # Sample DataFrame sales_by_country = pd.DataFrame({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "revenue": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000] }) os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY" agent = Agent(sales_by_country) print(agent.chat('Which are the top 5 countries by sales?'))
-
Advanced Queries: You can ask complex questions involving multiple DataFrames:
employees_data = { 'EmployeeID': [1, 2, 3, 4, 5], 'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'], 'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance'] } salaries_data = { 'EmployeeID': [1, 2, 3, 4, 5], 'Salary': [5000, 6000, 4500, 7000, 5500] } employees_df = pd.DataFrame(employees_data) salaries_df = pd.DataFrame(salaries_data) agent = Agent([employees_df, salaries_df]) print(agent.chat("Who gets paid the most?"))