The takeaway this is a complex system, to define it, we will need to think in terms of some components at the right abstraction.
The end to end system diagram illustrates how we train, evaluate, log, and interface with the application. The primary high level components, each encoded in their own color:
- User Facing (green)
- “Offline” Training (blue)
- “Online” Inferencing (yellow)
- Logging and Observability (white)
- External Resources (red)
%%{ init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#000', 'primaryTextColor': '#fff', 'primaryBorderColor': '#7C0000', 'lineColor': '#000', 'secondaryColor': '#006100', 'tertiaryColor': '#fff', 'fontSize' : '32px' } } }%% flowchart TD A["User Query"] --> B["Query Preprocessor"] B --> C["Prompt Template"] D["Training Data"] --> E["Data Augmentation"] E --> F["Prompt Engineering"] G["Base LLM"] --> H["Fine-tuning"] F --> H I["Evaluation Metrics"] --> H H --> J["Trained LLM"] J --> C C --> K["SQL Generation"] K --> L["SQL Validator"] L --> M["Retrosheet Database"] M --> O["User Facing Results"] P["Logging & Monitoring"] <--> B & E & F & H & K & L & M style A fill: green style O fill: green style D fill: blue style E fill: blue style F fill: blue style I fill: blue style H fill: blue style J fill: blue style C fill: yellow, color: black style K fill: yellow, color: black style L fill: yellow, color: black style C fill: yellow, color: black style B fill: yellow, color: black style G fill: red style M fill: red style P fill: white, color: black