What is Ragnerock?
An intelligence layer for your data lake.
Ragnerock is a research intelligence platform that transforms unstructured inputs into structured, queryable data. Upload documents, define extraction schemas, and query the results with SQL or natural language.
Core Capabilities
Document Processing
Ragnerock processes documents through a multi-stage pipeline:
- Ingestion — Upload PDFs, Word documents, Excel files, PowerPoint presentations, or plain text
- Text Extraction — Extract text content, tables, page boundaries, and document structure
- Chunking — Split documents into semantic chunks optimized for search and analysis
- Embedding — Generate vector embeddings for semantic search across your library
Supported formats include PDF, DOCX, DOC, XLSX, PPTX, TXT, HTML, and Markdown.
Annotation System
Define custom extraction schemas using Operators:
- JSON Schema — Specify the exact structure of data to extract
- Generation Prompt — Provide AI instructions for extraction
- Scope — Choose granularity: document, page, paragraph, or sentence level
Chain multiple operators into Workflows for complex processing pipelines. All extracted data is SQL-queryable.
Research Agent
A conversational AI assistant in the web application that can:
- Search semantically across your entire document library
- Query your structured annotation data
- Synthesize insights from multiple documents
- Provide citations linking every claim to source documents
Provenance
Every annotation maintains full provenance:
- Document — The source document
- Page — The specific page (if applicable)
- Chunk — The exact text passage
- Operator — The extraction schema used
This enables complete audit trails and verification of any extracted data point.
Integration Points
Python SDK
Connect using an SQLAlchemy-style pattern:
from ragnerock import create_engine, Session, Document
engine = create_engine("ragnerock://user@example.com:pass@api.ragnerock.com/my_project")
with Session(engine) as session:
# List documents
for doc in session.list(Document):
print(doc.name)
# Query annotations with SQL
result = session.query("SELECT * FROM sentiment_analysis LIMIT 10")
df = result.to_pandas()
SQL Query Interface
Query annotation data directly with SQL:
result = session.query("""
SELECT document_name, sentiment, confidence
FROM sentiment_analysis
WHERE confidence > 0.8
ORDER BY confidence DESC
""")
JupyterLab Integration
The Ragnerock JupyterLab extension provides:
- Notebook sidebar for Research Agent conversations
- Export agent responses as Python objects
- Seamless transition from exploration to analysis
Getting Started
- Installation — Install the Python SDK
- Quick Start — Upload a document and run your first workflow
- Core Concepts — Understand documents, annotations, and the research agent
- SDK Reference — Full Python SDK documentation