Pagination

Iterate over large result sets efficiently with the Ragnerock Python SDK.

When listing resources, the SDK returns a PaginatedIterator that fetches pages of results on demand. This allows you to efficiently work with large datasets without loading everything into memory at once.

Basic Usage

The session.list() method returns a PaginatedIterator:

from ragnerock import create_engine, Session, Document

engine = create_engine("ragnerock://user@example.com:pass@api.ragnerock.com/my_project")

with Session(engine) as session:
    # Iterate over all documents
    for doc in session.list(Document):
        print(doc.name)

Pages are fetched automatically as you iterate. The default page size is 100 items.

Convenience Methods

Get All Results

Use .all() to eagerly fetch all results into a list:

# Fetch all documents at once
all_docs = session.list(Document).all()

print(f"Total documents: {len(all_docs)}")

Get First Result

Use .first() to fetch only the first item, or None if empty:

# Get the first document (only fetches one item)
first_doc = session.list(Document).first()

if first_doc:
    print(f"First document: {first_doc.name}")
else:
    print("No documents found")

Limit Results

Use .limit(n) to cap the total number of items returned:

# Get at most 10 documents
recent_docs = session.list(Document).limit(10).all()

# Or iterate over limited results
for doc in session.list(Document).limit(5):
    print(doc.name)

Note: .limit() must be called before iteration begins. It returns a new iterator with the limit applied.

How Pagination Works

The PaginatedIterator manages pagination transparently:

  1. Lazy fetching: No API calls are made until you start iterating or call a method like .all() or .first()
  2. Automatic paging: As you iterate, new pages are fetched automatically when the current page is exhausted
  3. Memory efficient: Only the current page is kept in memory during iteration
# This makes no API calls yet
iterator = session.list(Document)

# First API call happens here (fetches first page)
for doc in iterator:
    print(doc.name)
    # Additional pages fetched automatically as needed

Filtering Results

Pass keyword arguments to session.list() to filter results. Available filters depend on the resource type:

Documents

# List all documents (no filter)
docs = session.list(Document).all()

Annotations

from ragnerock import Annotation

# Filter by document
annotations = session.list(Annotation, document_id="uuid-here").all()

# Filter by chunk
annotations = session.list(Annotation, chunk_id="uuid-here").all()

# Filter by schema (operator)
annotations = session.list(Annotation, schema_id="uuid-here").all()

Resource-Level Listing

Resources also have a .list() method for related resources:

# List annotations for a specific document
doc = session.get(Document, name="Apple 10-K")
for annotation in doc.list(Annotation):
    print(annotation.data)

# Filter by operator
for ann in doc.list(Annotation, operator="sentiment"):
    print(ann.data)

# List chunks for a document
for chunk in doc.list(Chunk):
    print(chunk.content[:100])

# List pages for a document
for page in doc.list(Page):
    print(f"Page {page.page_number}: {len(page.content)} chars")

Iteration Patterns

Process in Batches

Since pages are fetched on demand, you can process items in batches:

batch = []
batch_size = 50

for doc in session.list(Document):
    batch.append(doc)

    if len(batch) >= batch_size:
        process_batch(batch)
        batch = []

# Don't forget the last partial batch
if batch:
    process_batch(batch)

Early Exit

You can break out of iteration at any time. No additional pages will be fetched:

# Find first document matching a condition
for doc in session.list(Document):
    if "quarterly" in doc.name.lower():
        print(f"Found: {doc.name}")
        break  # Stops pagination

Combine with limit()

For more control, combine .limit() with iteration:

# Process first 100 documents
for doc in session.list(Document).limit(100):
    analyze_document(doc)

Next Steps

  • Session — Learn about all Session methods
  • Resources — Understand Document, Annotation, and other resources
  • SQL Queries — Query annotation data with SQL