Deep Research RAG Evaluation (in Progress)
I wanted to try out Anthropic's statistical evaluation method to determine which LLM has performed better. So i decided to test out deep research from 5 different types of Generative AI: Simple Naive Rag, Rag with Contextual embedding, using GPT-Researcher package, home made agentic AI Deep Research RAG, and GPT-Researcher with contextual embedding.
LangGraphRAGDockerPythonDsPYPrompt EngineeringAgentic AIQdrant Vector DatabaseHypothesis Testing