SDK Overview

An introduction to the Ragnerock Python SDK.

The Ragnerock Python SDK provides a high-level interface to the Ragnerock platform. It uses an SQLAlchemy-inspired pattern with Engine and Session objects, making it familiar to Python developers who work with databases.

Installation

pip install ragnerock

Or with uv:

uv add ragnerock

Quick Example

from ragnerock import create_engine, Session, Document, Annotation

# Connect to your project
engine = create_engine("ragnerock://user@example.com:pass@api.ragnerock.com/my_project")

with Session(engine) as session:
    # List all documents
    for doc in session.list(Document):
        print(f"{doc.name} - {doc.status}")

    # Get annotations for a document
    doc = session.get(Document, name="Apple 10-K 2024")
    for ann in doc.list(Annotation, operator="financial_metrics"):
        print(ann.data)

    # Query annotation data with SQL
    result = session.query("""
        SELECT document_name, revenue, net_income
        FROM financial_metrics
        WHERE revenue > 100000
    """)
    df = result.to_pandas()

Core Concepts

The SDK is organized around a few key concepts:

ConceptDescription
EngineHolds connection configuration and manages authentication
SessionA context manager for interacting with a project
ResourcesData objects like Document, Annotation, Operator, Workflow
PaginatedIteratorLazy iterator for efficient pagination
QueryResultResults from SQL queries on annotation data

Connection String

Connect to Ragnerock using a connection string:

ragnerock://email:password@host/project_name

Examples:

# Production
engine = create_engine("ragnerock://user@company.com:pass@api.ragnerock.com/sec_analysis")

# Local development
engine = create_engine("ragnerock://dev@test.com:pass@localhost:8080/test_project")

Session Pattern

All operations happen within a Session context:

from ragnerock import create_engine, Session

engine = create_engine("ragnerock://...")

with Session(engine) as session:
    # get() - Retrieve a single resource
    doc = session.get(Document, id="uuid-here")
    doc = session.get(Document, name="My Document")

    # list() - Iterate over resources
    for doc in session.list(Document):
        print(doc.name)

    # create() - Create a new resource
    doc = Document(file_path="/path/to/file.pdf", name="New Doc")
    session.create(doc)

    # delete() - Delete a resource
    session.delete(doc)

    # query() - Run SQL queries
    result = session.query("SELECT * FROM annotations")

    # run() - Execute workflows
    job = session.run(workflow, documents=[doc])

Resources

The SDK provides these resource types:

ResourceDescription
DocumentAn uploaded document (PDF, Word, etc.)
AnnotationAI-generated structured data attached to a document
OperatorAn annotation schema that defines extraction logic
WorkflowA DAG of operators that process documents
JobA handle to track workflow execution
ChunkA text segment within a document
PageA page within a document

Type Safety

The SDK is fully typed and works with mypy and Pyright:

from ragnerock import Session, Document, Annotation

with Session(engine) as session:
    doc: Document | None = session.get(Document, name="report.pdf")
    annotations: list[Annotation] = doc.list(Annotation).all() if doc else []

Error Handling

The SDK provides typed exceptions for different error conditions:

from ragnerock import (
    RagnerockError,
    AuthenticationError,
    NotFoundError,
    ValidationError,
    QueryError,
)

try:
    doc = session.get(Document, id="nonexistent")
except NotFoundError:
    print("Document not found")
except AuthenticationError:
    print("Invalid credentials")

Optional Dependencies

Install optional dependencies for additional features:

# pandas support for query results
pip install ragnerock[pandas]

Next Steps