SQL Queries

Query your annotation data using SQL with the Ragnerock Python SDK.

Ragnerock stores AI-generated annotations in a queryable format. Use session.query() to run SQL queries against your annotation data and get results as Python objects or pandas DataFrames.

Basic Usage

from ragnerock import create_engine, Session

engine = create_engine("ragnerock://user@example.com:pass@api.ragnerock.com/my_project")

with Session(engine) as session:
    result = session.query("""
        SELECT document_name, sentiment_score
        FROM sentiment_annotations
        WHERE sentiment_score > 0.5
    """)

    print(f"Found {result.row_count} rows")

The QueryResult Class

session.query() returns a QueryResult object with the following attributes:

AttributeTypeDescription
columnslist[str]Column names in the result set
datalist[dict]Rows as a list of dictionaries
row_countintNumber of rows returned
query_time_msint | NoneQuery execution time in milliseconds
result = session.query("SELECT * FROM my_annotations LIMIT 10")

print(result.columns)       # ['id', 'document_name', 'score', ...]
print(result.row_count)     # 10
print(result.query_time_ms) # 23
print(len(result))          # 10 (supports len())

Converting Results

To List of Dictionaries

Use to_dict() to get results as a list of dictionaries, one per row:

result = session.query("""
    SELECT document_name, revenue, net_income
    FROM financial_metrics
    WHERE revenue > 1000000
""")

for row in result.to_dict():
    print(f"{row['document_name']}: ${row['revenue']}M revenue")

To pandas DataFrame

Use to_pandas() to convert results to a pandas DataFrame. This requires the pandas optional dependency:

pip install ragnerock[pandas]
result = session.query("""
    SELECT document_name, sentiment_score, key_topics
    FROM earnings_call_analysis
    ORDER BY sentiment_score DESC
""")

df = result.to_pandas()

# Now use pandas operations
print(df.describe())
print(df.groupby('key_topics').mean())

Query Parameters

Limiting Results

Pass a limit parameter to cap the number of rows returned:

# Return at most 100 rows (default is 1000)
result = session.query(
    "SELECT * FROM annotations",
    limit=100
)

Querying Annotation Tables

Each operator (annotation schema) you create generates a queryable table. The table name is derived from the operator name.

Finding Available Tables

# List all operators to see available annotation tables
for op in session.list(Operator):
    print(f"Table: {op.name}")
    print(f"  Description: {op.description}")

Table Structure

Annotation tables include:

  • All fields defined in the operator’s JSON schema
  • document_id — UUID of the source document
  • document_name — Name of the source document
  • chunk_id — UUID of the source chunk (if chunk-level annotation)
  • created_at — Timestamp when the annotation was created
  • confidence_score — Model confidence score (if available)

Example Queries

Aggregate statistics:

result = session.query("""
    SELECT
        AVG(sentiment_score) as avg_sentiment,
        MIN(sentiment_score) as min_sentiment,
        MAX(sentiment_score) as max_sentiment,
        COUNT(*) as total_documents
    FROM sentiment_analysis
""")

Join with document metadata:

result = session.query("""
    SELECT
        a.document_name,
        a.revenue,
        a.net_income,
        a.risk_factors
    FROM financial_metrics a
    WHERE a.revenue > 100000
    ORDER BY a.revenue DESC
    LIMIT 20
""")

Filter by date:

result = session.query("""
    SELECT document_name, key_findings
    FROM research_summaries
    WHERE created_at > '2024-01-01'
    ORDER BY created_at DESC
""")

Full-text search in annotation data:

result = session.query("""
    SELECT document_name, risk_factors
    FROM sec_filing_analysis
    WHERE risk_factors LIKE '%cybersecurity%'
""")

Error Handling

Query errors raise QueryError with details about what went wrong:

from ragnerock import QueryError

try:
    result = session.query("SELECT * FROM nonexistent_table")
except QueryError as e:
    print(f"Query failed: {e.message}")
    print(f"Error code: {e.error_code}")
    if e.suggestion:
        print(f"Suggestion: {e.suggestion}")

Next Steps