Learn how to build an automated product cataloging system that extracts product attributes, listing copy, and category data from images using ViscribeAI's Python library.
The Challenge: Manual Product Cataloging
E-commerce businesses face a constant challenge: efficiently cataloging thousands of products with accurate descriptions, specifications, and categories. Manual data entry is time-consuming, error-prone, and doesn't scale. What if you could automate 90% of this process using AI?
The Solution: ViscribeAI
ViscribeAI uses one schema-driven extraction endpoint to create a complete product cataloging solution:
- Structured Fields: Pull product specs, prices, and features from images
- Listing Copy: Generate concise product summaries inside your schema
- Category Data: Route products into your taxonomy with confidence notes
- Quality Signals: Capture visible evidence and review flags in the same response
Implementation: Step-by-Step Guide
1. Installation and Setup
First, install the ViscribeAI Python library:
pip install viscribeSet OPENAI_API_KEY in your environment, then import the extract helper:
from pydantic import BaseModel, Field
from viscribe.images import extract2. Extract Product Information
Define the exact catalog shape you want back from the product image:
class ProductCatalogEntry(BaseModel):
product_name: str = Field(description="Customer-facing product name")
brand: str | None = Field(description="Visible product brand, if present")
price: float | None = Field(description="Visible price in USD, if present")
key_features: list[str] = Field(description="Visible product features")
color: str | None = Field(description="Primary visible color")
listing_summary: str = Field(description="Short product summary for a catalog page")
tags: list[str] = Field(description="Search and merchandising tags")
result = extract(
image_url="https://example.com/laptop.jpg",
output_schema=ProductCatalogEntry,
instruction="Extract catalog-ready product data. Use null for values that are not visible."
)
product = result.data
print(product.model_dump())
# Output: {
# "product_name": "Dell XPS 15 Laptop",
# "brand": "Dell",
# "price": 1299.99,
# "key_features": ["15.6 inch display", "Intel i7", "16GB RAM", "512GB SSD"],
# "color": "Silver",
# "listing_summary": "A sleek silver laptop for professional productivity.",
# "tags": ["laptop", "electronics", "silver", "ultrabook"]
# }3. Add Category Routing
Add taxonomy and merchandising fields to the same extraction call:
class CategoryDecision(BaseModel):
category_path: str = Field(description="Best matching category path from the allowed taxonomy")
confidence: float = Field(description="Confidence score from 0 to 1")
evidence: list[str] = Field(description="Visible clues that justify the category")
needs_review: bool = Field(description="True when the image is ambiguous")
result = extract(
image_url="https://example.com/laptop.jpg",
output_schema=CategoryDecision,
instruction=(
"Choose exactly one category from: Electronics > Computers > Laptops, "
"Electronics > Tablets, Electronics > Accessories, Office Supplies, Gaming Equipment."
)
)
print(result.data.category_path)
# "Electronics > Computers > Laptops"
print(result.data.confidence)
# 0.974. Capture Review Signals
For production systems, include fields that tell downstream tools when a human should review the item:
class ProductReview(BaseModel):
visible_defects: list[str] = Field(description="Visible damage or packaging issues")
missing_information: list[str] = Field(description="Important catalog details not visible")
needs_manual_review: bool = Field(description="True if the image is unclear or incomplete")
reviewer_note: str = Field(description="Short note for the catalog operations team")
result = extract(
image_url="https://example.com/product.jpg",
output_schema=ProductReview,
instruction="Inspect the product image for catalog quality issues before publishing."
)
print(result.data.reviewer_note)Complete Workflow Example
Here's a complete function that processes a product image and returns all catalog information:
from pydantic import BaseModel, Field
from viscribe.images import extract
class CatalogEntry(BaseModel):
name: str
brand: str | None
category_path: str
features: list[str]
description: str
tags: list[str]
visible_quality_issues: list[str]
confidence: float
needs_review: bool
def catalog_product(image_url: str) -> dict:
result = extract(
image_url=image_url,
output_schema=CatalogEntry,
instruction=(
"Create a publishable catalog record. Choose a category from the store taxonomy, "
"write concise listing copy, and flag uncertainty for manual review."
)
)
return result.data.model_dump()
catalog_entry = catalog_product("https://example.com/product.jpg")
print(catalog_entry)Real-World Use Cases
1. Marketplace Sellers
Automatically catalog products from supplier images when importing inventory. This reduces onboarding time from hours to minutes per product.
2. Dropshipping Businesses
Extract and rewrite product information from manufacturer websites to create unique listings that avoid duplicate content issues.
3. Inventory Management Systems
Process images of warehouse items to maintain accurate inventory records with proper categorization and descriptions.
4. Product Research Sites
Extract specifications from multiple product images to build comparison tables automatically.
Advanced: Handling Complex Products
For products with complex nested data, pass a richer Pydantic model as the output schema:
from pydantic import BaseModel, Field
from typing import List, Optional
from viscribe.images import extract
class Specification(BaseModel):
processor: str
ram: str
storage: str
display: str
graphics: Optional[str]
class Product(BaseModel):
name: str
brand: str
price: Optional[float]
specifications: Specification
warranty_years: Optional[int]
evidence: List[str] = Field(description="Visible clues used to fill the record")
result = extract(
image_url="https://example.com/laptop-specs.jpg",
output_schema=Product,
instruction="Extract a typed laptop catalog record from the visible image and spec panel."
)
product = result.data
print(f"Processor: {product.specifications.processor}")
print(f"RAM: {product.specifications.ram}")Performance Tips
- Batch Processing: Use async client for processing multiple images concurrently
- Caching: Store results to avoid re-processing the same images
- Error Handling: Implement retry logic for failed requests
- Rate Limiting: Respect API limits and implement exponential backoff
Async Processing for Scale
When processing thousands of products, use aextract for better throughput:
import asyncio
from pydantic import BaseModel
from viscribe.images import aextract
class ProductSummary(BaseModel):
name: str
description: str
tags: list[str]
async def process_products(image_urls):
tasks = [
aextract(
image_url=url,
output_schema=ProductSummary,
instruction="Extract a concise catalog summary and searchable tags."
)
for url in image_urls
]
results = await asyncio.gather(*tasks)
return [result.data.model_dump() for result in results]
# Process 100 products in parallel
image_urls = ["https://example.com/product1.jpg", ...]
results = asyncio.run(process_products(image_urls))Measuring Success
Companies using ViscribeAI for product cataloging typically see:
- 80-90% reduction in manual data entry time
- 95%+ accuracy in extracted data
- Consistent quality across thousands of products
- Faster time-to-market for new products
Start Building
Ready to automate your product cataloging? Install the library, choose the model provider you already use, and explore the full documentation for examples and best practices. You can also tell us what you are building through the community form.


