What is Ragnerock? - Ragnerock Docs

Ragnerock is a research intelligence platform that transforms unstructured inputs into structured, queryable data. Upload documents, define extraction schemas, and query the results with SQL or natural language.

Core Capabilities

Document Processing

Ragnerock processes documents through a multi-stage pipeline:

Ingestion — Upload PDFs, Word documents, Excel files, PowerPoint presentations, or plain text
Text Extraction — Extract text content, tables, page boundaries, and document structure
Chunking — Split documents into semantic chunks optimized for search and analysis
Embedding — Generate vector embeddings for semantic search across your library

Supported formats include PDF, DOCX, DOC, XLSX, PPTX, TXT, HTML, and Markdown.

Annotation System

Define custom extraction schemas using Operators:

JSON Schema — Specify the exact structure of data to extract
Generation Prompt — Provide AI instructions for extraction
Scope — Choose granularity: document, page, paragraph, or sentence level

Chain multiple operators into Workflows for complex processing pipelines. All extracted data is SQL-queryable.

Research Agent

A conversational AI assistant in the web application that can:

Search semantically across your entire document library
Query your structured annotation data
Synthesize insights from multiple documents
Provide citations linking every claim to source documents

Provenance

Every annotation maintains full provenance:

Document — The source document
Page — The specific page (if applicable)
Chunk — The exact text passage
Operator — The extraction schema used

This enables complete audit trails and verification of any extracted data point.

Integration Points

Python SDK

Connect using an SQLAlchemy-style pattern:

from ragnerock import create_engine, Session, Document

engine = create_engine("ragnerock://user@example.com:pass@api.ragnerock.com/my_project")

with Session(engine) as session:
    # List documents
    for doc in session.list(Document):
        print(doc.name)

    # Query annotations with SQL
    result = session.query("SELECT * FROM sentiment_analysis LIMIT 10")
    df = result.to_pandas()

SQL Query Interface

Query annotation data directly with SQL:

result = session.query("""
    SELECT document_name, sentiment, confidence
    FROM sentiment_analysis
    WHERE confidence > 0.8
    ORDER BY confidence DESC
""")

JupyterLab Integration

The Ragnerock JupyterLab extension provides:

Notebook sidebar for Research Agent conversations
Export agent responses as Python objects
Seamless transition from exploration to analysis

Getting Started

Installation — Install the Python SDK
Quick Start — Upload a document and run your first workflow
Core Concepts — Understand documents, annotations, and the research agent
SDK Reference — Full Python SDK documentation