What is Ragnerock?

An intelligence layer for your data lake.

Ragnerock is a research intelligence platform that transforms unstructured inputs into structured, queryable data. Upload documents, define extraction schemas, and query the results with SQL or natural language.

Core Capabilities

Document Processing

Ragnerock processes documents through a multi-stage pipeline:

  1. Ingestion — Upload PDFs, Word documents, Excel files, PowerPoint presentations, or plain text
  2. Text Extraction — Extract text content, tables, page boundaries, and document structure
  3. Chunking — Split documents into semantic chunks optimized for search and analysis
  4. Embedding — Generate vector embeddings for semantic search across your library

Supported formats include PDF, DOCX, DOC, XLSX, PPTX, TXT, HTML, and Markdown.

Annotation System

Define custom extraction schemas using Operators:

  • JSON Schema — Specify the exact structure of data to extract
  • Generation Prompt — Provide AI instructions for extraction
  • Scope — Choose granularity: document, page, paragraph, or sentence level

Chain multiple operators into Workflows for complex processing pipelines. All extracted data is SQL-queryable.

Research Agent

A conversational AI assistant in the web application that can:

  • Search semantically across your entire document library
  • Query your structured annotation data
  • Synthesize insights from multiple documents
  • Provide citations linking every claim to source documents

Provenance

Every annotation maintains full provenance:

  • Document — The source document
  • Page — The specific page (if applicable)
  • Chunk — The exact text passage
  • Operator — The extraction schema used

This enables complete audit trails and verification of any extracted data point.

Integration Points

Python SDK

Connect using an SQLAlchemy-style pattern:

from ragnerock import create_engine, Session, Document

engine = create_engine("ragnerock://user@example.com:pass@api.ragnerock.com/my_project")

with Session(engine) as session:
    # List documents
    for doc in session.list(Document):
        print(doc.name)

    # Query annotations with SQL
    result = session.query("SELECT * FROM sentiment_analysis LIMIT 10")
    df = result.to_pandas()

SQL Query Interface

Query annotation data directly with SQL:

result = session.query("""
    SELECT document_name, sentiment, confidence
    FROM sentiment_analysis
    WHERE confidence > 0.8
    ORDER BY confidence DESC
""")

JupyterLab Integration

The Ragnerock JupyterLab extension provides:

  • Notebook sidebar for Research Agent conversations
  • Export agent responses as Python objects
  • Seamless transition from exploration to analysis

Getting Started

  1. Installation — Install the Python SDK
  2. Quick Start — Upload a document and run your first workflow
  3. Core Concepts — Understand documents, annotations, and the research agent
  4. SDK Reference — Full Python SDK documentation