The takeaway this is a complex system, to define it, we will need to think in terms of some components at the right abstraction.

The end to end system diagram illustrates how we train, evaluate, log, and interface with the application. The primary high level components, each encoded in their own color:

  1. User Facing (green)
  2. “Offline” Training (blue)
  3. “Online” Inferencing (yellow)
  4. Logging and Observability (white)
  5. External Resources (red)
%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#000',
      'primaryTextColor': '#fff',
      'primaryBorderColor': '#7C0000',
      'lineColor': '#000',
      'secondaryColor': '#006100',
      'tertiaryColor': '#fff',
      'fontSize' : '32px'
    }
  }
}%%

flowchart TD
    A["User Query"] --> B["Query Preprocessor"]
    B --> C["Prompt Template"]
    D["Training Data"] --> E["Data Augmentation"]
    E --> F["Prompt Engineering"]
    G["Base LLM"] --> H["Fine-tuning"]
    F --> H
    I["Evaluation Metrics"] --> H
    H --> J["Trained LLM"]
    J --> C
    C --> K["SQL Generation"]
    K --> L["SQL Validator"]
    L --> M["Retrosheet Database"]
    M --> O["User Facing Results"]
    P["Logging & Monitoring"] <--> B & E & F & H & K & L & M

    style A fill: green
    style O fill: green

    style D fill: blue
    style E fill: blue
    style F fill: blue
    style I fill: blue
    style H fill: blue
    style J fill: blue

    style C fill: yellow, color: black
    style K fill: yellow, color: black
    style L fill: yellow, color: black
    style C fill: yellow, color: black
    style B fill: yellow, color: black

    style G fill: red
    style M fill: red

    style P fill: white, color: black