Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interceptor to capture messages into a database #642

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

ahuang11
Copy link
Contributor

@ahuang11 ahuang11 commented Aug 8, 2024

patch_client intercepts messages from openai's client chat completions' create, and stores them into a database.

Additionally, patch_client_response catches the response, locates the previous batch, and also stores the response.

The reason it's separate is because of instructor:

  • patch_client: We need to patch it before instructor patches it to get the function/tool_choice keyword arguments input by instructor
  • patch_client_response: We need to patch it after instructor patches it to get the properly formatted BaseModel

Example usage outside of Lumen

import openai
import instructor
from pydantic import BaseModel
from lumen.ai.interceptor import OpenAIInterceptor

stream = True
llm = openai.AsyncClient()

interceptor = OpenAIInterceptor()
interceptor.patch_client(llm, mode="store_inputs")
llm = instructor.patch(llm)
interceptor.patch_client_response(llm)

class Feelings(BaseModel):

    feelings: list[str]

    rating: int

response = await llm.chat.completions.create(
    messages=[
        {"role": "system", "content": "Be a helpful assistant."},
        {"role": "user", "content": "How are you?"},
    ],
    model="gpt-4o-mini",
    stream=stream,
    response_model=instructor.Partial[Feelings]
)
if stream:
    async for out in response:
        ...

response = await llm.chat.completions.create(
    messages=[
        {"role": "system", "content": "Be happy."},
        {"role": "user", "content": "How about you?"},
    ],
    model="gpt-4o-mini",
    stream=stream,
)
if stream:
    async for chunk in response:
        ...

interceptor.get_session()

Returns

[{'batch_id': 1,
  'messages': [{'role': 'system', 'content': 'Be a helpful assistant.'},
   {'role': 'user', 'content': 'How are you?'}],
  'kwargs': {'model': 'gpt-4o-mini',
   'stream': True,
   'tools': [{'type': 'function',
     'function': {'name': 'PartialFeelings',
      'description': 'Correctly extracted `PartialFeelings` with all the required parameters with correct types',
      'parameters': {'properties': {'feelings': {'items': {'type': 'string'},
         'title': 'Feelings',
         'type': 'array'},
        'rating': {'title': 'Rating', 'type': 'integer'}},
       'required': ['feelings', 'rating'],
       'type': 'object'}}}],
   'tool_choice': {'type': 'function',
    'function': {'name': 'PartialFeelings'}}},
  'response': '{"feelings": ["happy", "curious"], "rating": 8}'},
 {'batch_id': 2,
  'messages': [{'role': 'system', 'content': 'Be happy.'},
   {'role': 'user', 'content': 'How about you?'}],
  'kwargs': {'model': 'gpt-4o-mini', 'stream': True},
  'response': "I'm just a program, but I'm here to help you! What would you like to talk about?"}]

session_id = every time interceptor gets instantiated
batch_id = every time llm.chat.completions.create gets called

@ahuang11 ahuang11 changed the title add interceptor Add interceptor to capture messages into a database Aug 8, 2024
@ahuang11 ahuang11 marked this pull request as ready for review August 8, 2024 15:08
Copy link

codecov bot commented Aug 8, 2024

Codecov Report

Attention: Patch coverage is 0% with 223 lines in your changes missing coverage. Please review.

Project coverage is 54.88%. Comparing base (c2ea8c9) to head (e4c43b0).

Files with missing lines Patch % Lines
lumen/ai/interceptor.py 0.00% 169 Missing ⚠️
lumen/ai/llm.py 0.00% 54 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #642      +/-   ##
==========================================
- Coverage   55.81%   54.88%   -0.93%     
==========================================
  Files          98       99       +1     
  Lines       11879    12080     +201     
==========================================
  Hits         6630     6630              
- Misses       5249     5450     +201     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ahuang11
Copy link
Contributor Author

ahuang11 commented Aug 8, 2024

Example Lumen output
[{'batch_id': 1,
  'messages': [{'role': 'system',
      'content': "Select the most relevant agent for the user's query.\n\nEach agent can request other agents to fill in the blanks, so pick the agent that can best answer the entire query.\n\nHere's the choice of agents and their uses:\n```\n\n- `SourceAgent`: The SourceAgent allows a user to provide an input source. Should only be used if the user explicitly requests adding a source or no source is in memory.\n\n- `TableAgent`: Displays a single table / dataset. Does not discuss.\n\n- `SQLAgent`: Responsible for generating and modifying SQL queries to answer user queries about the data, such querying subsets of the data, aggregating the data and calculating results.\n\n- `PipelineAgent`: Generates a data pipeline by applying SQL to the data. If the user asks to calculate, aggregate, select or perform some other operation on the data this is your best best.\n\n- `hvPlotAgent`: Generates a plot of the data given a user prompt. If the user asks to plot, visualize or render the data this is your best best.\n\n- `ChatAgent`: Chats and provides info about high level data related topics, e.g. what datasets are available, the columns of the data or statistics about the data, and continuing the conversation. Is capable of providing suggestions to get started or comment on interesting tidbits. If data is available, it can also talk about the data itself.\n\n- `AnalysisAgent`: Available analyses include: - `YdataProfiling`: Provides high level data profiling. Select this agent to perform one of these analyses.\n\n```"},
  {'role': 'user', 'content': 'Show table'}],
  'kwargs': {'model': 'gpt-4o-mini',
  'temperature': 0.2,
  'tools': [{'type': 'function',
      'function': {'name': 'RelevantAgent',
      'description': 'Correctly extracted `RelevantAgent` with all the required parameters with correct types',
      'parameters': {'properties': {'chain_of_thought': {'description': 'Explain in your own words, what the user wants.',
          'title': 'Chain Of Thought',
          'type': 'string'},
          'agent': {'description': 'The most relevant agent to use.',
          'enum': ['SourceAgent',
          'TableAgent',
          'SQLAgent',
          'PipelineAgent',
          'hvPlotAgent',
          'ChatAgent',
          'AnalysisAgent'],
          'title': 'Agent',
          'type': 'string'}},
      'required': ['agent', 'chain_of_thought'],
      'type': 'object'}}}],
  'tool_choice': {'type': 'function', 'function': {'name': 'RelevantAgent'}}},
  'response': '{"chain_of_thought": "The user is requesting to see a table, which indicates they want to display a dataset. The most suitable agent for this task is the TableAgent, as it is specifically designed to display a single table or dataset without further discussion.", "agent": "TableAgent"}'},
  {'batch_id': 2,
  'messages': [{'role': 'system',
      'content': 'Based on the latest user\'s query, decide whether the current table and data contains the the required data. If the user is referencing "this data" or "that" they probably are referring to the current data. Pay particular attention to whether the data actually contains the columns required to answer the query, e.g. if they are asking for a location but there are no location related column the data is invalid.\n\n### Current Table:\n```\nwindturbines.parquet\n```\n\n### Current SQL:\n```\nNone\n```\n\n### Current Data `{column: stat}`:\n```\n{\'case_id\': {\'type\': \'integer\', \'min\': 3000001, \'max\': 3125948}, \'faa_ors\': {\'type\': \'string\', \'enum\': [\'39-003863\', \'19-020351\', \'19-020377\', \'19-020448\', \'19-058716\', \'...\']}, \'faa_asn\': {\'type\': \'string\', \'enum\': [\'2016-WTE-5934-OE\', \'2013-WTE-5497-OE\', \'2018-WTE-9908-OE\', \'2018-WTE-9870-OE\', \'2018-WTE-9867-OE\', \'...\']}, \'usgs_pr_id\': {\'type\': \'number\', \'min\': 1.0, \'max\': 49135.0}, \'eia_id\': {\'type\': \'number\', \'min\': 90.0, \'max\': 65525.0}, \'t_state\': {\'type\': \'string\', \'enum\': [\'OH\', \'IN\', \'CO\', \'OR\', \'AK\', \'...\']}, \'t_county\': {\'type\': \'string\', \'enum\': [\'Kern County\', \'Poweshiek County\', \'Barnstable County\', \'Laramie County\', \'Blair County\', \'...\']}, \'t_fips\': {\'type\': \'integer\', \'min\': 2013, \'max\': 72133}, \'p_name\': {\'type\': \'string\', \'enum\': [\'Agriwind\', \'Ainsworth Wind Project (NPPD)\', \'Alta III\', \'Amazon Wind Farm (Fowler Ridge)\', \'Anacacho\', \'...\']}, \'p_year\': {\'type\': \'number\', \'min\': 1981.0, \'max\': 2022.0}, \'p_tnum\': {\'type\': \'integer\', \'min\': 1, \'max\': 731}, \'p_cap\': {\'type\': \'number\', \'min\': 0.05, \'max\': 1055.6}, \'t_manu\': {\'type\': \'string\', \'enum\': [\'Siemens\', \'Kenersys\', \'Aeronautica\', \'AAER\', \'Vensys\', \'...\']}, \'t_model\': {\'type\': \'string\', \'enum\': [\'SWT-2.3-108\', \'ECO 86\', \'V82-1.65\', \'FL1500\', \'GE2.85-103\', \'...\']}, \'t_cap\': {\'type\': \'number\', \'min\': 50.0, \'max\': 6000.0}, \'t_hh\': {\'type\': \'number\', \'min\': 19.0, \'max\': 137.0}, \'t_rd\': {\'type\': \'number\', \'min\': 13.4, \'max\': 162.0}, \'t_rsa\': {\'type\': \'number\', \'min\': 141.03, \'max\': 20611.99}, \'t_ttlh\': {\'type\': \'number\', \'min\': 30.4, \'max\': 205.4}, \'retrofit\': {\'type\': \'integer\', \'min\': 0, \'max\': 1}, \'retrofit_year\': {\'type\': \'number\', \'min\': 2015.0, \'max\': 2021.0}, \'t_conf_atr\': {\'type\': \'integer\', \'min\': 1, \'max\': 3}, \'t_conf_loc\': {\'type\': \'integer\', \'min\': 1, \'max\': 3}, \'t_img_date\': {\'type\': \'string\', \'format\': \'datetime\', \'min\': \'1993-06-14 00:00:00\', \'max\': \'2022-09-06 00:00:00\'}, \'t_img_srce\': {\'type\': \'string\', \'enum\': [\'NAIP\', \'Google Earth\', \'Bing Maps Aerial\', \'Digital Globe\']}, \'xlong\': {\'type\': \'number\', \'min\': -171.713074, \'max\': 144.722656}, \'ylat\': {\'type\': \'number\', \'min\': 13.389381, \'max\': 66.839905}, \'easting\': {\'type\': \'number\', \'min\': -19115011.960227706, \'max\': 16110452.372170098}, \'northing\': {\'type\': \'number\', \'min\': 1504253.4146257618, \'max\': 10110596.985745305}}\n```\n\n\nIf the user requests one of the current analyses, the data is assumed to be valid.\n\n### Current Analyses:\n\n- <class \'bokeh_app_5a551904cf0e428da1abcc36b35f26d4.YdataProfiling\'>'},
  {'role': 'assistant',
      'content': "source:\n  initializers:\n  - INSTALL httpfs;\n  - LOAD httpfs;\n  tables:\n  - windturbines.parquet\n  type: duckdb\n  uri: ':memory:'\ntable: windturbines.parquet\n"},
  {'role': 'user', 'content': 'Tell me about this table'}],
  'kwargs': {'model': 'gpt-4o-mini',
  'temperature': 0.2,
  'tools': [{'type': 'function',
      'function': {'name': 'Validity',
      'description': 'Correctly extracted `Validity` with all the required parameters with correct types',
      'parameters': {'properties': {'correct_assessment': {'description': "\n        Thoughts on whether the current table meets the requirement\n        to answer the user's query, i.e. table contains all necessary columns,\n        unless user explicitly asks for a refresh.\n        ",
          'title': 'Correct Assessment',
          'type': 'string'},
          'is_invalid': {'anyOf': [{'enum': ['table', 'sql'], 'type': 'string'},
          {'type': 'null'}],
          'description': 'Whether the `table` or `sql` is invalid or no longer relevant depending on correct assessment. None if valid.',
          'title': 'Is Invalid'}},
      'required': ['correct_assessment', 'is_invalid'],
      'type': 'object'}}}],
  'tool_choice': {'type': 'function', 'function': {'name': 'Validity'}}},
  'response': '{"correct_assessment": "The table contains data related to wind turbines, including various attributes such as case ID, FAA identifiers, state, county, project name, year, capacity, manufacturer, model, and geographic coordinates (longitude and latitude).", "is_invalid": null}'},
  {'batch_id': 3,
  'messages': [{'role': 'system',
      'content': "Select the most relevant agent for the user's query.\n\nEach agent can request other agents to fill in the blanks, so pick the agent that can best answer the entire query.\n\nHere's the choice of agents and their uses:\n```\n\n- `SourceAgent`: The SourceAgent allows a user to provide an input source. Should only be used if the user explicitly requests adding a source or no source is in memory.\n\n- `TableAgent`: Displays a single table / dataset. Does not discuss.\n\n- `SQLAgent`: Responsible for generating and modifying SQL queries to answer user queries about the data, such querying subsets of the data, aggregating the data and calculating results.\n\n- `PipelineAgent`: Generates a data pipeline by applying SQL to the data. If the user asks to calculate, aggregate, select or perform some other operation on the data this is your best best.\n\n- `hvPlotAgent`: Generates a plot of the data given a user prompt. If the user asks to plot, visualize or render the data this is your best best.\n\n- `ChatAgent`: Chats and provides info about high level data related topics, e.g. what datasets are available, the columns of the data or statistics about the data, and continuing the conversation. Is capable of providing suggestions to get started or comment on interesting tidbits. If data is available, it can also talk about the data itself.\n\n- `AnalysisAgent`: Available analyses include: - `YdataProfiling`: Provides high level data profiling. Select this agent to perform one of these analyses.\n\n```"},
  {'role': 'assistant',
      'content': 'The user is requesting to see a table, which indicates they want to display a dataset. The most suitable agent for this task is the TableAgent, as it is specifically designed to display a single table or dataset without further discussion.'},
  {'role': 'assistant',
      'content': "source:\n  initializers:\n  - INSTALL httpfs;\n  - LOAD httpfs;\n  tables:\n  - windturbines.parquet\n  type: duckdb\n  uri: ':memory:'\ntable: windturbines.parquet\n"},
  {'role': 'user', 'content': 'Tell me about this table'}],
  'kwargs': {'model': 'gpt-4o-mini',
  'temperature': 0.2,
  'tools': [{'type': 'function',
      'function': {'name': 'RelevantAgent',
      'description': 'Correctly extracted `RelevantAgent` with all the required parameters with correct types',
      'parameters': {'properties': {'chain_of_thought': {'description': 'Explain in your own words, what the user wants.',
          'title': 'Chain Of Thought',
          'type': 'string'},
          'agent': {'description': 'The most relevant agent to use.',
          'enum': ['SourceAgent',
          'TableAgent',
          'SQLAgent',
          'PipelineAgent',
          'hvPlotAgent',
          'ChatAgent',
          'AnalysisAgent'],
          'title': 'Agent',
          'type': 'string'}},
      'required': ['agent', 'chain_of_thought'],
      'type': 'object'}}}],
  'tool_choice': {'type': 'function', 'function': {'name': 'RelevantAgent'}}},
  'response': '{"chain_of_thought": "The user wants to know more about the structure and contents of the \'windturbines.parquet\' table, such as its columns and data types, which is best handled by the ChatAgent.", "agent": "ChatAgent"}'},
  {'batch_id': 4,
  'messages': [{'role': 'system',
      'content': "Be a helpful chatbot to talk about high-level data exploration like what datasets are available and what each column represents. Provide suggestions to get started if necessary.Be a helpful chatbot to talk about high-level data exploration like what datasets are available and what each column represents. Provide suggestions to get started if necessary.\n### CONTEXT: windturbines.parquet with schema: {'case_id': {'type': 'integer', 'min': 3000001, 'max': 3125948}, 'faa_ors': {'type': 'string', 'enum': ['39-003863', '19-020351', '19-020377', '19-020448', '19-058716', '...']}, 'faa_asn': {'type': 'string', 'enum': ['2016-WTE-5934-OE', '2013-WTE-5497-OE', '2018-WTE-9908-OE', '2018-WTE-9870-OE', '2018-WTE-9867-OE', '...']}, 'usgs_pr_id': {'type': 'number', 'min': 1.0, 'max': 49135.0}, 'eia_id': {'type': 'number', 'min': 90.0, 'max': 65525.0}, 't_state': {'type': 'string', 'enum': ['OH', 'IN', 'CO', 'OR', 'AK', '...']}, 't_county': {'type': 'string', 'enum': ['Kern County', 'Poweshiek County', 'Barnstable County', 'Laramie County', 'Blair County', '...']}, 't_fips': {'type': 'integer', 'min': 2013, 'max': 72133}, 'p_name': {'type': 'string', 'enum': ['Agriwind', 'Ainsworth Wind Project (NPPD)', 'Alta III', 'Amazon Wind Farm (Fowler Ridge)', 'Anacacho', '...']}, 'p_year': {'type': 'number', 'min': 1981.0, 'max': 2022.0}, 'p_tnum': {'type': 'integer', 'min': 1, 'max': 731}, 'p_cap': {'type': 'number', 'min': 0.05, 'max': 1055.6}, 't_manu': {'type': 'string', 'enum': ['Siemens', 'Kenersys', 'Aeronautica', 'AAER', 'Vensys', '...']}, 't_model': {'type': 'string', 'enum': ['SWT-2.3-108', 'ECO 86', 'V82-1.65', 'FL1500', 'GE2.85-103', '...']}, 't_cap': {'type': 'number', 'min': 50.0, 'max': 6000.0}, 't_hh': {'type': 'number', 'min': 19.0, 'max': 137.0}, 't_rd': {'type': 'number', 'min': 13.4, 'max': 162.0}, 't_rsa': {'type': 'number', 'min': 141.03, 'max': 20611.99}, 't_ttlh': {'type': 'number', 'min': 30.4, 'max': 205.4}, 'retrofit': {'type': 'integer', 'min': 0, 'max': 1}, 'retrofit_year': {'type': 'number', 'min': 2015.0, 'max': 2021.0}, 't_conf_atr': {'type': 'integer', 'min': 1, 'max': 3}, 't_conf_loc': {'type': 'integer', 'min': 1, 'max': 3}, 't_img_date': {'type': 'string', 'format': 'datetime', 'min': '1993-06-14 00:00:00', 'max': '2022-09-06 00:00:00'}, 't_img_srce': {'type': 'string', 'enum': ['NAIP', 'Google Earth', 'Bing Maps Aerial', 'Digital Globe']}, 'xlong': {'type': 'number', 'min': -171.713074, 'max': 144.722656}, 'ylat': {'type': 'number', 'min': 13.389381, 'max': 66.839905}, 'easting': {'type': 'number', 'min': -19115011.960227706, 'max': 16110452.372170098}, 'northing': {'type': 'number', 'min': 1504253.4146257618, 'max': 10110596.985745305}}\nHere's a summary of the dataset the user just asked about:\n```\n{'summary': {'total_table_cells': 2098353, 'total_shape': (72357, 29), 'is_summarized': True}, 'stats': {'case_id': {'count': '5000', 'mean': '3.1e+06', 'min': '3000088', '50%': '3053985', 'max': '3125610', 'std': '3.5e+04', 'nulls': '0'}, 'usgs_pr_id': {'count': '2638', 'mean': '2.7e+04', 'min': '13', '50%': '2.9e+04', 'max': '49121', 'std': '1.4e+04', 'nulls': '2362'}, 'eia_id': {'count': '4635', 'mean': '5.8e+04', 'min': '90', '50%': '57851', 'max': '65129', 'std': '5.8e+03', 'nulls': '365'}, 't_fips': {'count': '5000', 'mean': '3.3e+04', 'min': '2050', '50%': '36053', 'max': '72133', 'std': '1.6e+04', 'nulls': '0'}, 'p_year': {'count': '4968', 'mean': '2.0e+03', 'min': '1982', '50%': '2012', 'max': '2022', 'std': '8.0', 'nulls': '32'}, 'p_tnum': {'count': '5000', 'mean': '1.1e+02', 'min': '1', '50%': '86', 'max': '713', 'std': '95.0', 'nulls': '0'}, 'p_cap': {'count': '4778', 'mean': '1.8e+02', 'min': '0.1', '50%': '162', 'max': '1.1e+03', 'std': '1.3e+02', 'nulls': '222'}, 't_cap': {'count': '4690', 'mean': '2.0e+03', 'min': '50', '50%': '2000', 'max': '6000', 'std': '7.7e+02', 'nulls': '310'}, 't_hh': {'count': '4651', 'mean': '81.8', 'min': '19', '50%': '80', 'max': '131', 'std': '12.2', 'nulls': '349'}, 't_rd': {'count': '4656', 'mean': '97.7', 'min': '15', '50%': '100', 'max': '162', 'std': '23.7', 'nulls': '344'}, 't_rsa': {'count': '4656', 'mean': '7.9e+03', 'min': '1.8e+02', '50%': '7.9e+03', 'max': '2.1e+04', 'std': '3.5e+03', 'nulls': '344'}, 't_ttlh': {'count': '4651', 'mean': '1.3e+02', 'min': '30.4', '50%': '1.3e+02', 'max': '200', 'std': '22.7', 'nulls': '349'}, 'retrofit': {'count': '5000', 'mean': '0.1', 'min': '0', '50%': '0', 'max': '1', 'std': '0.3', 'nulls': '0'}, 'retrofit_year': {'count': '451', 'mean': '2.0e+03', 'min': '2017', '50%': '2019', 'max': '2021', 'std': '1.3', 'nulls': '4549'}, 't_conf_atr': {'count': '5000', 'mean': '2.8', 'min': '1', '50%': '3', 'max': '3', 'std': '0.6', 'nulls': '0'}, 't_conf_loc': {'count': '5000', 'mean': '2.9', 'min': '1', '50%': '3', 'max': '3', 'std': '0.4', 'nulls': '0'}, 't_img_date': {'count': '4428', 'mean': '2018-07-05 10:52:40.975609', 'min': '2006-01-01 00:00:00', '50%': '2018-11-03 00:00:00', 'max': '2022-09-03 00:00:00', 'std': 'nan', 'nulls': '572'}, 'xlong': {'count': '5000', 'mean': '-1.0e+02', 'min': '-1.7e+02', '50%': '-99.5', 'max': '-65.7', 'std': '11.1', 'nulls': '0'}, 'ylat': {'count': '5000', 'mean': '38.3', 'min': '18.0', '50%': '38.2', 'max': '66.6', 'std': '5.4', 'nulls': '0'}, 'easting': {'count': '5000', 'mean': '-1.1e+07', 'min': '-1.8e+07', '50%': '-1.1e+07', 'max': '-7.3e+06', 'std': '1.2e+06', 'nulls': '0'}, 'northing': {'count': '5000', 'mean': '4.7e+06', 'min': '2.0e+06', '50%': '4.6e+06', 'max': '1.0e+07', 'std': '7.8e+05', 'nulls': '0'}, 'faa_ors': {'nunique': 4619, 'lengths': {'max': 9.0, 'min': 9.0, 'mean': 9.0}, 'nulls': 381}, 'faa_asn': {'nunique': 4611, 'lengths': {'max': 17.0, 'min': 13.0, 'mean': 16.048822639879024}, 'nulls': 371}, 't_state': {'nunique': 39, 'lengths': {'max': 2, 'min': 2, 'mean': 2.0}, 'nulls': 0}, 't_county': {'nunique': 417, 'lengths': {'max': 31, 'min': 10, 'mean': 13.5078}, 'nulls': 0}, 'p_name': {'nunique': 1060, 'lengths': {'max': 42, 'min': 3, 'mean': 14.4844}, 'nulls': 0}, 't_manu': {'nunique': 31, 'lengths': {'max': 31.0, 'min': 3.0, 'mean': 7.569536423841059}, 'nulls': 319}, 't_model': {'nunique': 200, 'lengths': {'max': 14.0, 'min': 3.0, 'mean': 8.57910511667737}, 'nulls': 329}, 't_img_srce': {'nunique': 4, 'lengths': {'max': 16, 'min': 4, 'mean': 12.49}, 'nulls': 0}}}\n```"},
  {'role': 'assistant',
      'content': 'The user is requesting to see a table, which indicates they want to display a dataset. The most suitable agent for this task is the TableAgent, as it is specifically designed to display a single table or dataset without further discussion.'},
  {'role': 'assistant',
      'content': "source:\n  initializers:\n  - INSTALL httpfs;\n  - LOAD httpfs;\n  tables:\n  - windturbines.parquet\n  type: duckdb\n  uri: ':memory:'\ntable: windturbines.parquet\n"},
  {'role': 'user', 'content': 'Tell me about this table'}],
  'kwargs': {'model': 'gpt-4o-mini', 'temperature': 0.2, 'stream': True},
  'response': "The dataset you are looking at is related to wind turbines and contains various attributes that describe each turbine. Here’s a breakdown of the columns in the dataset along with their descriptions:\n\n1. **case_id**: An integer identifier for each case (turbine).\n2. **faa_ors**: A string representing the FAA (Federal Aviation Administration) Obstruction Registration System number.\n3. **faa_asn**: A string representing the FAA Aeronautical Study Number.\n4. **usgs_pr_id**: A numeric identifier from the US Geological Survey.\n5. **eia_id**: A numeric identifier from the Energy Information Administration.\n6. **t_state**: A string representing the state where the turbine is located (e.g., 'OH', 'IN').\n7. **t_county**: A string representing the county where the turbine is located.\n8. **t_fips**: An integer representing the Federal Information Processing Standards code for the county.\n9. **p_name**: A string representing the name of the project associated with the turbine.\n10. **p_year**: A numeric value indicating the year the project was established.\n11. **p_tnum**: An integer representing the total number of turbines in the project.\n12. **p_cap**: A numeric value representing the capacity of the project in megawatts.\n13. **t_manu**: A string representing the manufacturer of the turbine.\n14. **t_model**: A string representing the model of the turbine.\n15. **t_cap**: A numeric value representing the capacity of the turbine in megawatts.\n16. **t_hh**: A numeric value representing the hub height of the turbine in meters.\n17. **t_rd**: A numeric value representing the rotor diameter in meters.\n18. **t_rsa**: A numeric value representing the rotor swept area in square meters.\n19. **t_ttlh**: A numeric value representing the total height of the turbine in meters.\n20. **retrofit**: An integer indicating whether the turbine has been retrofitted (0 for no, 1 for yes).\n21. **retrofit_year**: A numeric value indicating the year of retrofit.\n22. **t_conf_atr**: An integer representing the configuration attribute (1-3).\n23. **t_conf_loc**: An integer representing the configuration location (1-3).\n24. **t_img_date**: A string representing the date of the image taken (in datetime format).\n25. **t_img_srce**: A string representing the source of the image (e.g., 'NAIP', 'Google Earth').\n26. **xlong**: A numeric value representing the longitude of the turbine's location.\n27. **ylat**: A numeric value representing the latitude of the turbine's location.\n28. **easting**: A numeric value representing the easting coordinate in a projected coordinate system.\n29. **northing**: A numeric value representing the northing coordinate in a projected coordinate system.\n\n### Suggestions to Get Started:\n- **Data Cleaning**: Check for any missing values or outliers in the dataset.\n- **Descriptive Statistics**: Generate summary statistics for numerical columns to understand the distribution of values.\n- **Visualizations**: Create visualizations (e.g., histograms, scatter plots) to explore relationships between different attributes, such as turbine capacity and height.\n- **Geospatial Analysis**: If interested in geographical aspects, consider mapping the turbines based on their latitude and longitude.\n\nIf you have specific questions or need help with a particular analysis, feel free to ask!"},
  {'batch_id': 5,
  'messages': [{'role': 'system',
      'content': 'Based on the latest user\'s query, decide whether the current table and data contains the the required data. If the user is referencing "this data" or "that" they probably are referring to the current data. Pay particular attention to whether the data actually contains the columns required to answer the query, e.g. if they are asking for a location but there are no location related column the data is invalid.\n\n### Current Table:\n```\nwindturbines.parquet\n```\n\n### Current SQL:\n```\nNone\n```\n\n### Current Data `{column: stat}`:\n```\n{\'case_id\': {\'type\': \'integer\', \'min\': 3000001, \'max\': 3125948}, \'faa_ors\': {\'type\': \'string\', \'enum\': [\'39-003863\', \'19-020351\', \'19-020377\', \'19-020448\', \'19-058716\', \'...\']}, \'faa_asn\': {\'type\': \'string\', \'enum\': [\'2016-WTE-5934-OE\', \'2013-WTE-5497-OE\', \'2018-WTE-9908-OE\', \'2018-WTE-9870-OE\', \'2018-WTE-9867-OE\', \'...\']}, \'usgs_pr_id\': {\'type\': \'number\', \'min\': 1.0, \'max\': 49135.0}, \'eia_id\': {\'type\': \'number\', \'min\': 90.0, \'max\': 65525.0}, \'t_state\': {\'type\': \'string\', \'enum\': [\'OH\', \'IN\', \'CO\', \'OR\', \'AK\', \'...\']}, \'t_county\': {\'type\': \'string\', \'enum\': [\'Kern County\', \'Poweshiek County\', \'Barnstable County\', \'Laramie County\', \'Blair County\', \'...\']}, \'t_fips\': {\'type\': \'integer\', \'min\': 2013, \'max\': 72133}, \'p_name\': {\'type\': \'string\', \'enum\': [\'Agriwind\', \'Ainsworth Wind Project (NPPD)\', \'Alta III\', \'Amazon Wind Farm (Fowler Ridge)\', \'Anacacho\', \'...\']}, \'p_year\': {\'type\': \'number\', \'min\': 1981.0, \'max\': 2022.0}, \'p_tnum\': {\'type\': \'integer\', \'min\': 1, \'max\': 731}, \'p_cap\': {\'type\': \'number\', \'min\': 0.05, \'max\': 1055.6}, \'t_manu\': {\'type\': \'string\', \'enum\': [\'Siemens\', \'Kenersys\', \'Aeronautica\', \'AAER\', \'Vensys\', \'...\']}, \'t_model\': {\'type\': \'string\', \'enum\': [\'SWT-2.3-108\', \'ECO 86\', \'V82-1.65\', \'FL1500\', \'GE2.85-103\', \'...\']}, \'t_cap\': {\'type\': \'number\', \'min\': 50.0, \'max\': 6000.0}, \'t_hh\': {\'type\': \'number\', \'min\': 19.0, \'max\': 137.0}, \'t_rd\': {\'type\': \'number\', \'min\': 13.4, \'max\': 162.0}, \'t_rsa\': {\'type\': \'number\', \'min\': 141.03, \'max\': 20611.99}, \'t_ttlh\': {\'type\': \'number\', \'min\': 30.4, \'max\': 205.4}, \'retrofit\': {\'type\': \'integer\', \'min\': 0, \'max\': 1}, \'retrofit_year\': {\'type\': \'number\', \'min\': 2015.0, \'max\': 2021.0}, \'t_conf_atr\': {\'type\': \'integer\', \'min\': 1, \'max\': 3}, \'t_conf_loc\': {\'type\': \'integer\', \'min\': 1, \'max\': 3}, \'t_img_date\': {\'type\': \'string\', \'format\': \'datetime\', \'min\': \'1993-06-14 00:00:00\', \'max\': \'2022-09-06 00:00:00\'}, \'t_img_srce\': {\'type\': \'string\', \'enum\': [\'NAIP\', \'Google Earth\', \'Bing Maps Aerial\', \'Digital Globe\']}, \'xlong\': {\'type\': \'number\', \'min\': -171.713074, \'max\': 144.722656}, \'ylat\': {\'type\': \'number\', \'min\': 13.389381, \'max\': 66.839905}, \'easting\': {\'type\': \'number\', \'min\': -19115011.960227706, \'max\': 16110452.372170098}, \'northing\': {\'type\': \'number\', \'min\': 1504253.4146257618, \'max\': 10110596.985745305}}\n```\n\n\nIf the user requests one of the current analyses, the data is assumed to be valid.\n\n### Current Analyses:\n\n- <class \'bokeh_app_5a551904cf0e428da1abcc36b35f26d4.YdataProfiling\'>'},
  {'role': 'assistant',
      'content': "The dataset you are looking at is related to wind turbines and contains various attributes that describe each turbine. Here’s a breakdown of the columns in the dataset along with their descriptions:\n\n1. **case_id**: An integer identifier for each case (turbine).\n2. **faa_ors**: A string representing the FAA (Federal Aviation Administration) Obstruction Registration System number.\n3. **faa_asn**: A string representing the FAA Aeronautical Study Number.\n4. **usgs_pr_id**: A numeric identifier from the US Geological Survey.\n5. **eia_id**: A numeric identifier from the Energy Information Administration.\n6. **t_state**: A string representing the state where the turbine is located (e.g., 'OH', 'IN').\n7. **t_county**: A string representing the county where the turbine is located.\n8. **t_fips**: An integer representing the Federal Information Processing Standards code for the county.\n9. **p_name**: A string representing the name of the project associated with the turbine.\n10. **p_year**: A numeric value indicating the year the project was established.\n11. **p_tnum**: An integer representing the total number of turbines in the project.\n12. **p_cap**: A numeric value representing the capacity of the project in megawatts.\n13. **t_manu**: A string representing the manufacturer of the turbine.\n14. **t_model**: A string representing the model of the turbine.\n15. **t_cap**: A numeric value representing the capacity of the turbine in megawatts.\n16. **t_hh**: A numeric value representing the hub height of the turbine in meters.\n17. **t_rd**: A numeric value representing the rotor diameter in meters.\n18. **t_rsa**: A numeric value representing the rotor swept area in square meters.\n19. **t_ttlh**: A numeric value representing the total height of the turbine in meters.\n20. **retrofit**: An integer indicating whether the turbine has been retrofitted (0 for no, 1 for yes).\n21. **retrofit_year**: A numeric value indicating the year of retrofit.\n22. **t_conf_atr**: An integer representing the configuration attribute (1-3).\n23. **t_conf_loc**: An integer representing the configuration location (1-3).\n24. **t_img_date**: A string representing the date of the image taken (in datetime format).\n25. **t_img_srce**: A string representing the source of the image (e.g., 'NAIP', 'Google Earth').\n26. **xlong**: A numeric value representing the longitude of the turbine's location.\n27. **ylat**: A numeric value representing the latitude of the turbine's location.\n28. **easting**: A numeric value representing the easting coordinate in a projected coordinate system.\n29. **northing**: A numeric value representing the northing coordinate in a projected coordinate system.\n\n### Suggestions to Get Started:\n- **Data Cleaning**: Check for any missing values or outliers in the dataset.\n- **Descriptive Statistics**: Generate summary statistics for numerical columns to understand the distribution of values.\n- **Visualizations**: Create visualizations (e.g., histograms, scatter plots) to explore relationships between different attributes, such as turbine capacity and height.\n- **Geospatial Analysis**: If interested in geographical aspects, consider mapping the turbines based on their latitude and longitude.\n\nIf you have specific questions or need help with a particular analysis, feel free to ask!"},
  {'role': 'user', 'content': 'Show me a plot of it'}],
  'kwargs': {'model': 'gpt-4o-mini',
  'temperature': 0.2,
  'tools': [{'type': 'function',
      'function': {'name': 'Validity',
      'description': 'Correctly extracted `Validity` with all the required parameters with correct types',
      'parameters': {'properties': {'correct_assessment': {'description': "\n        Thoughts on whether the current table meets the requirement\n        to answer the user's query, i.e. table contains all necessary columns,\n        unless user explicitly asks for a refresh.\n        ",
          'title': 'Correct Assessment',
          'type': 'string'},
          'is_invalid': {'anyOf': [{'enum': ['table', 'sql'], 'type': 'string'},
          {'type': 'null'}],
          'description': 'Whether the `table` or `sql` is invalid or no longer relevant depending on correct assessment. None if valid.',
          'title': 'Is Invalid'}},
      'required': ['correct_assessment', 'is_invalid'],
      'type': 'object'}}}],
  'tool_choice': {'type': 'function', 'function': {'name': 'Validity'}}},
  'response': '{"correct_assessment": "The current table contains the necessary data to create plots, including numerical and categorical columns that can be visualized.", "is_invalid": null}'},
  {'batch_id': 6,
  'messages': [{'role': 'system',
      'content': "Select the most relevant agent for the user's query.\n\nEach agent can request other agents to fill in the blanks, so pick the agent that can best answer the entire query.\n\nHere's the choice of agents and their uses:\n```\n\n- `SourceAgent`: The SourceAgent allows a user to provide an input source. Should only be used if the user explicitly requests adding a source or no source is in memory.\n\n- `TableAgent`: Displays a single table / dataset. Does not discuss.\n\n- `SQLAgent`: Responsible for generating and modifying SQL queries to answer user queries about the data, such querying subsets of the data, aggregating the data and calculating results.\n\n- `PipelineAgent`: Generates a data pipeline by applying SQL to the data. If the user asks to calculate, aggregate, select or perform some other operation on the data this is your best best.\n\n- `hvPlotAgent`: Generates a plot of the data given a user prompt. If the user asks to plot, visualize or render the data this is your best best.\n\n- `ChatAgent`: Chats and provides info about high level data related topics, e.g. what datasets are available, the columns of the data or statistics about the data, and continuing the conversation. Is capable of providing suggestions to get started or comment on interesting tidbits. If data is available, it can also talk about the data itself.\n\n- `AnalysisAgent`: Available analyses include: - `YdataProfiling`: Provides high level data profiling. Select this agent to perform one of these analyses.\n\n```"},
  {'role': 'assistant',
      'content': 'The table contains data related to wind turbines, including various attributes such as case ID, FAA identifiers, state, county, project name, year, capacity, manufacturer, model, and geographic coordinates (longitude and latitude).'},
  {'role': 'assistant',
      'content': "The dataset you are looking at is related to wind turbines and contains various attributes that describe each turbine. Here’s a breakdown of the columns in the dataset along with their descriptions:\n\n1. **case_id**: An integer identifier for each case (turbine).\n2. **faa_ors**: A string representing the FAA (Federal Aviation Administration) Obstruction Registration System number.\n3. **faa_asn**: A string representing the FAA Aeronautical Study Number.\n4. **usgs_pr_id**: A numeric identifier from the US Geological Survey.\n5. **eia_id**: A numeric identifier from the Energy Information Administration.\n6. **t_state**: A string representing the state where the turbine is located (e.g., 'OH', 'IN').\n7. **t_county**: A string representing the county where the turbine is located.\n8. **t_fips**: An integer representing the Federal Information Processing Standards code for the county.\n9. **p_name**: A string representing the name of the project associated with the turbine.\n10. **p_year**: A numeric value indicating the year the project was established.\n11. **p_tnum**: An integer representing the total number of turbines in the project.\n12. **p_cap**: A numeric value representing the capacity of the project in megawatts.\n13. **t_manu**: A string representing the manufacturer of the turbine.\n14. **t_model**: A string representing the model of the turbine.\n15. **t_cap**: A numeric value representing the capacity of the turbine in megawatts.\n16. **t_hh**: A numeric value representing the hub height of the turbine in meters.\n17. **t_rd**: A numeric value representing the rotor diameter in meters.\n18. **t_rsa**: A numeric value representing the rotor swept area in square meters.\n19. **t_ttlh**: A numeric value representing the total height of the turbine in meters.\n20. **retrofit**: An integer indicating whether the turbine has been retrofitted (0 for no, 1 for yes).\n21. **retrofit_year**: A numeric value indicating the year of retrofit.\n22. **t_conf_atr**: An integer representing the configuration attribute (1-3).\n23. **t_conf_loc**: An integer representing the configuration location (1-3).\n24. **t_img_date**: A string representing the date of the image taken (in datetime format).\n25. **t_img_srce**: A string representing the source of the image (e.g., 'NAIP', 'Google Earth').\n26. **xlong**: A numeric value representing the longitude of the turbine's location.\n27. **ylat**: A numeric value representing the latitude of the turbine's location.\n28. **easting**: A numeric value representing the easting coordinate in a projected coordinate system.\n29. **northing**: A numeric value representing the northing coordinate in a projected coordinate system.\n\n### Suggestions to Get Started:\n- **Data Cleaning**: Check for any missing values or outliers in the dataset.\n- **Descriptive Statistics**: Generate summary statistics for numerical columns to understand the distribution of values.\n- **Visualizations**: Create visualizations (e.g., histograms, scatter plots) to explore relationships between different attributes, such as turbine capacity and height.\n- **Geospatial Analysis**: If interested in geographical aspects, consider mapping the turbines based on their latitude and longitude.\n\nIf you have specific questions or need help with a particular analysis, feel free to ask!"},
  {'role': 'user', 'content': 'Show me a plot of it'}],
  'kwargs': {'model': 'gpt-4o-mini',
  'temperature': 0.2,
  'tools': [{'type': 'function',
      'function': {'name': 'RelevantAgent',
      'description': 'Correctly extracted `RelevantAgent` with all the required parameters with correct types',
      'parameters': {'properties': {'chain_of_thought': {'description': 'Explain in your own words, what the user wants.',
          'title': 'Chain Of Thought',
          'type': 'string'},
          'agent': {'description': 'The most relevant agent to use.',
          'enum': ['SourceAgent',
          'TableAgent',
          'SQLAgent',
          'PipelineAgent',
          'hvPlotAgent',
          'ChatAgent',
          'AnalysisAgent'],
          'title': 'Agent',
          'type': 'string'}},
      'required': ['agent', 'chain_of_thought'],
      'type': 'object'}}}],
  'tool_choice': {'type': 'function', 'function': {'name': 'RelevantAgent'}}},
  'response': '{"chain_of_thought": "The user wants to visualize the data related to wind turbines, which suggests that a plot would be appropriate. The hvPlotAgent is the best choice for generating a plot based on the dataset.", "agent": "hvPlotAgent"}'},
  {'batch_id': 7,
  'messages': [{'role': 'system',
      'content': "\n        Generate the plot the user requested.\n        Note that `x`, `y`, `by` and `groupby` fields MUST ALL be unique columns;\n        no repeated columns are allowed.\n        Do not arbitrarily set `groupby` and `by` fields unless explicitly requested.\n        If a histogram is requested, use `y` instead of `x`.\n        If x is categorical or strings, prefer barh over bar, and use `y` for the values.\n        `hvPlotUIView` displays provides a component for exploring datasets interactively.\n    \n\nThe data to be plotted originates from a table called 'windturbines.parquet' and follows the following JSON schema:\n```json\n{'case_id': {'type': 'integer'}, 'faa_ors': {'type': 'string', 'enum': [None, '19-028134', '19-028015', '19-027954', '19-028030', '...']}, 'faa_asn': {'type': 'string', 'enum': [None, '2014-WTE-4084-OE', '2015-WTE-6386-OE', '2016-WTE-9485-OE', '2014-WTE-4080-OE', '...']}, 'usgs_pr_id': {'type': 'number'}, 'eia_id': {'type': 'number'}, 't_state': {'type': 'string', 'enum': ['CA', 'IA', 'MA', 'OH', 'MN', '...']}, 't_county': {'type': 'string', 'enum': ['Kern County', 'Story County', 'Boone County', 'Poweshiek County', 'Hardin County', '...']}, 't_fips': {'type': 'integer'}, 'p_name': {'type': 'string', 'enum': ['251 Wind', '30 MW Iowa DG Portfolio', '6th Space Warning Squadron', 'AFCEE MMR Turbines', 'AG Land 1', '...']}, 'p_year': {'type': 'number'}, 'p_tnum': {'type': 'integer'}, 'p_cap': {'type': 'number'}, 't_manu': {'type': 'string', 'enum': ['Vestas', 'Nordex', 'GE Wind', 'Siemens', 'Alstom', '...']}, 't_model': {'type': 'string', 'enum': [None, 'AW125/3000', 'GE1.68-82.5', 'GE1.5-77', 'GE1.6-82.5', '...']}, 't_cap': {'type': 'number'}, 't_hh': {'type': 'number'}, 't_rd': {'type': 'number'}, 't_rsa': {'type': 'number'}, 't_ttlh': {'type': 'number'}, 'retrofit': {'type': 'integer'}, 'retrofit_year': {'type': 'number'}, 't_conf_atr': {'type': 'integer'}, 't_conf_loc': {'type': 'integer'}, 't_img_date': {'type': 'string', 'format': 'datetime'}, 't_img_srce': {'type': 'string', 'enum': ['Digital Globe', 'NAIP', 'Google Earth', 'Bing Maps Aerial']}, 'xlong': {'type': 'number'}, 'ylat': {'type': 'number'}, 'easting': {'type': 'number'}, 'northing': {'type': 'number'}}\n```"},
  {'role': 'assistant',
      'content': 'The table contains data related to wind turbines, including various attributes such as case ID, FAA identifiers, state, county, project name, year, capacity, manufacturer, model, and geographic coordinates (longitude and latitude).'},
  {'role': 'assistant',
      'content': "The dataset you are looking at is related to wind turbines and contains various attributes that describe each turbine. Here’s a breakdown of the columns in the dataset along with their descriptions:\n\n1. **case_id**: An integer identifier for each case (turbine).\n2. **faa_ors**: A string representing the FAA (Federal Aviation Administration) Obstruction Registration System number.\n3. **faa_asn**: A string representing the FAA Aeronautical Study Number.\n4. **usgs_pr_id**: A numeric identifier from the US Geological Survey.\n5. **eia_id**: A numeric identifier from the Energy Information Administration.\n6. **t_state**: A string representing the state where the turbine is located (e.g., 'OH', 'IN').\n7. **t_county**: A string representing the county where the turbine is located.\n8. **t_fips**: An integer representing the Federal Information Processing Standards code for the county.\n9. **p_name**: A string representing the name of the project associated with the turbine.\n10. **p_year**: A numeric value indicating the year the project was established.\n11. **p_tnum**: An integer representing the total number of turbines in the project.\n12. **p_cap**: A numeric value representing the capacity of the project in megawatts.\n13. **t_manu**: A string representing the manufacturer of the turbine.\n14. **t_model**: A string representing the model of the turbine.\n15. **t_cap**: A numeric value representing the capacity of the turbine in megawatts.\n16. **t_hh**: A numeric value representing the hub height of the turbine in meters.\n17. **t_rd**: A numeric value representing the rotor diameter in meters.\n18. **t_rsa**: A numeric value representing the rotor swept area in square meters.\n19. **t_ttlh**: A numeric value representing the total height of the turbine in meters.\n20. **retrofit**: An integer indicating whether the turbine has been retrofitted (0 for no, 1 for yes).\n21. **retrofit_year**: A numeric value indicating the year of retrofit.\n22. **t_conf_atr**: An integer representing the configuration attribute (1-3).\n23. **t_conf_loc**: An integer representing the configuration location (1-3).\n24. **t_img_date**: A string representing the date of the image taken (in datetime format).\n25. **t_img_srce**: A string representing the source of the image (e.g., 'NAIP', 'Google Earth').\n26. **xlong**: A numeric value representing the longitude of the turbine's location.\n27. **ylat**: A numeric value representing the latitude of the turbine's location.\n28. **easting**: A numeric value representing the easting coordinate in a projected coordinate system.\n29. **northing**: A numeric value representing the northing coordinate in a projected coordinate system.\n\n### Suggestions to Get Started:\n- **Data Cleaning**: Check for any missing values or outliers in the dataset.\n- **Descriptive Statistics**: Generate summary statistics for numerical columns to understand the distribution of values.\n- **Visualizations**: Create visualizations (e.g., histograms, scatter plots) to explore relationships between different attributes, such as turbine capacity and height.\n- **Geospatial Analysis**: If interested in geographical aspects, consider mapping the turbines based on their latitude and longitude.\n\nIf you have specific questions or need help with a particular analysis, feel free to ask!"},
  {'role': 'user', 'content': 'Show me a plot of it'}],
  'kwargs': {'model': 'gpt-4o-mini',
  'temperature': 0.2,
  'tools': [{'type': 'function',
      'function': {'name': 'hvPlotUIView',
      'description': 'Correctly extracted `hvPlotUIView` with all the required parameters with correct types',
      'parameters': {'properties': {'loading_indicator': {'default': True,
          'description': 'Whether to display a loading indicator on the View when the Pipeline is refreshing the data.',
          'title': 'Loading Indicator',
          'type': 'boolean'},
          'limit': {'anyOf': [{'type': 'integer'}, {'type': 'null'}],
          'default': None,
          'description': 'Limits the number of rows that are rendered.',
          'title': 'Limit'},
          'title': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
          'default': None,
          'description': 'The title of the view.',
          'title': 'Title'},
          'kind': {'default': None,
          'description': "The kind of plot, e.g. 'scatter' or 'line'.",
          'enum': ['area',
          'bar',
          'barh',
          'bivariate',
          'box',
          'contour',
          'contourf',
          'errorbars',
          'hist',
          'image',
          'kde',
          'labels',
          'line',
          'scatter',
          'heatmap',
          'hexbin',
          'ohlc',
          'points',
          'step',
          'violin'],
          'title': 'Kind',
          'type': 'string'},
          'x': {'default': None,
          'description': 'The column to render on the x-axis.',
          'enum': ['case_id',
          'faa_ors',
          'faa_asn',
          'usgs_pr_id',
          'eia_id',
          't_state',
          't_county',
          't_fips',
          'p_name',
          'p_year',
          'p_tnum',
          'p_cap',
          't_manu',
          't_model',
          't_cap',
          't_hh',
          't_rd',
          't_rsa',
          't_ttlh',
          'retrofit',
          'retrofit_year',
          't_conf_atr',
          't_conf_loc',
          't_img_date',
          't_img_srce',
          'xlong',
          'ylat',
          'easting',
          'northing'],
          'title': 'X',
          'type': 'string'},
          'y': {'default': None,
          'description': 'The column to render on the y-axis.',
          'enum': ['case_id',
          'faa_ors',
          'faa_asn',
          'usgs_pr_id',
          'eia_id',
          't_state',
          't_county',
          't_fips',
          'p_name',
          'p_year',
          'p_tnum',
          'p_cap',
          't_manu',
          't_model',
          't_cap',
          't_hh',
          't_rd',
          't_rsa',
          't_ttlh',
          'retrofit',
          'retrofit_year',
          't_conf_atr',
          't_conf_loc',
          't_img_date',
          't_img_srce',
          'xlong',
          'ylat',
          'easting',
          'northing'],
          'title': 'Y',
          'type': 'string'},
          'by': {'default': None,
          'description': 'The column(s) to facet the plot by.',
          'items': {'enum': ['case_id',
          'faa_ors',
          'faa_asn',
          'usgs_pr_id',
          'eia_id',
          't_state',
          't_county',
          't_fips',
          'p_name',
          'p_year',
          'p_tnum',
          'p_cap',
          't_manu',
          't_model',
          't_cap',
          't_hh',
          't_rd',
          't_rsa',
          't_ttlh',
          'retrofit',
          'retrofit_year',
          't_conf_atr',
          't_conf_loc',
          't_img_date',
          't_img_srce',
          'xlong',
          'ylat',
          'easting',
          'northing'],
          'type': 'string'},
          'title': 'By',
          'type': 'array'},
          'groupby': {'default': None,
          'description': 'The column(s) to group by.',
          'items': {'enum': ['case_id',
          'faa_ors',
          'faa_asn',
          'usgs_pr_id',
          'eia_id',
          't_state',
          't_county',
          't_fips',
          'p_name',
          'p_year',
          'p_tnum',
          'p_cap',
          't_manu',
          't_model',
          't_cap',
          't_hh',
          't_rd',
          't_rsa',
          't_ttlh',
          'retrofit',
          'retrofit_year',
          't_conf_atr',
          't_conf_loc',
          't_img_date',
          't_img_srce',
          'xlong',
          'ylat',
          'easting',
          'northing'],
          'type': 'string'},
          'title': 'Groupby',
          'type': 'array'},
          'geo': {'default': False,
          'description': 'Toggle True if the plot is on a geographic map.',
          'title': 'Geo',
          'type': 'boolean'},
          'chain_of_thought': {'description': 'Your thought process behind the plot.',
          'title': 'Chain Of Thought',
          'type': 'string'}},
      'required': ['chain_of_thought'],
      'type': 'object'}}}],
  'tool_choice': {'type': 'function', 'function': {'name': 'hvPlotUIView'}}},
  'response': '{"loading_indicator": true, "limit": null, "title": "Wind Turbines Data Plot", "kind": "scatter", "x": "t_cap", "y": "t_hh", "by": null, "groupby": null, "geo": false, "chain_of_thought": "This scatter plot visualizes the relationship between the turbine capacity (t_cap) and the hub height (t_hh) of wind turbines. It helps to understand how the capacity of a turbine correlates with its height."}'}]

Copy link
Member

@philippjfr philippjfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, can you write out or generate some diagrams for the SQL schemas?

@ahuang11
Copy link
Contributor Author

ahuang11 commented Sep 18, 2024

Here's a diagram:

image
  • session_ids are per session (e.g. each time opening up the app and seeing the ChatInterface)

  • batches are a group of messages, e.g.

    • prompt may be: Show me olympics_medals table
      • Batch 1: RelevantAgent needs to choose the agent
        • {system_message_for_relevant_agent, past_messages, user_prompt}
      • Batch 2: TableAgent needs to select the table
        • {system_message_for_table_agent, past_messages, user_prompt}
  • messages are system/assistant/user messages

Here are the message_batches output
image

Here are the messages output
image

Here are the responses output
image

I don't really remember why I store message_kwargs; maybe just for archival reasons
image

lumen/ai/llm.py Outdated Show resolved Hide resolved
@ahuang11
Copy link
Contributor Author

Not planning to support MistralAI & AnthropicAI in this PR

@ahuang11
Copy link
Contributor Author

ahuang11 commented Oct 10, 2024

Updated the tables.

image

Each invocation is a llm.invoke() call.

In get_session I postprocess it a bit to gather input_id and prompt from messages (retrieving and comparing user message prompts)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants