Pagination
Iterate over large result sets efficiently with the Ragnerock Python SDK.
When listing resources, the SDK returns a PaginatedIterator that fetches pages of results on demand. This allows you to efficiently work with large datasets without loading everything into memory at once.
Basic Usage
The session.list() method returns a PaginatedIterator:
from ragnerock import create_engine, Session, Document
engine = create_engine("ragnerock://user@example.com:pass@api.ragnerock.com/my_project")
with Session(engine) as session:
# Iterate over all documents
for doc in session.list(Document):
print(doc.name)
Pages are fetched automatically as you iterate. The default page size is 100 items.
Convenience Methods
Get All Results
Use .all() to eagerly fetch all results into a list:
# Fetch all documents at once
all_docs = session.list(Document).all()
print(f"Total documents: {len(all_docs)}")
Get First Result
Use .first() to fetch only the first item, or None if empty:
# Get the first document (only fetches one item)
first_doc = session.list(Document).first()
if first_doc:
print(f"First document: {first_doc.name}")
else:
print("No documents found")
Limit Results
Use .limit(n) to cap the total number of items returned:
# Get at most 10 documents
recent_docs = session.list(Document).limit(10).all()
# Or iterate over limited results
for doc in session.list(Document).limit(5):
print(doc.name)
Note: .limit() must be called before iteration begins. It returns a new iterator with the limit applied.
How Pagination Works
The PaginatedIterator manages pagination transparently:
- Lazy fetching: No API calls are made until you start iterating or call a method like
.all()or.first() - Automatic paging: As you iterate, new pages are fetched automatically when the current page is exhausted
- Memory efficient: Only the current page is kept in memory during iteration
# This makes no API calls yet
iterator = session.list(Document)
# First API call happens here (fetches first page)
for doc in iterator:
print(doc.name)
# Additional pages fetched automatically as needed
Filtering Results
Pass keyword arguments to session.list() to filter results. Available filters depend on the resource type:
Documents
# List all documents (no filter)
docs = session.list(Document).all()
Annotations
from ragnerock import Annotation
# Filter by document
annotations = session.list(Annotation, document_id="uuid-here").all()
# Filter by chunk
annotations = session.list(Annotation, chunk_id="uuid-here").all()
# Filter by schema (operator)
annotations = session.list(Annotation, schema_id="uuid-here").all()
Resource-Level Listing
Resources also have a .list() method for related resources:
# List annotations for a specific document
doc = session.get(Document, name="Apple 10-K")
for annotation in doc.list(Annotation):
print(annotation.data)
# Filter by operator
for ann in doc.list(Annotation, operator="sentiment"):
print(ann.data)
# List chunks for a document
for chunk in doc.list(Chunk):
print(chunk.content[:100])
# List pages for a document
for page in doc.list(Page):
print(f"Page {page.page_number}: {len(page.content)} chars")
Iteration Patterns
Process in Batches
Since pages are fetched on demand, you can process items in batches:
batch = []
batch_size = 50
for doc in session.list(Document):
batch.append(doc)
if len(batch) >= batch_size:
process_batch(batch)
batch = []
# Don't forget the last partial batch
if batch:
process_batch(batch)
Early Exit
You can break out of iteration at any time. No additional pages will be fetched:
# Find first document matching a condition
for doc in session.list(Document):
if "quarterly" in doc.name.lower():
print(f"Found: {doc.name}")
break # Stops pagination
Combine with limit()
For more control, combine .limit() with iteration:
# Process first 100 documents
for doc in session.list(Document).limit(100):
analyze_document(doc)
Next Steps
- Session — Learn about all Session methods
- Resources — Understand Document, Annotation, and other resources
- SQL Queries — Query annotation data with SQL