Automate E-commerce Product Cataloging with AI

Learn how to build an automated product cataloging system that extracts product attributes, listing copy, and category data from images using ViscribeAI's Python library.

The Challenge: Manual Product Cataloging

E-commerce businesses face a constant challenge: efficiently cataloging thousands of products with accurate descriptions, specifications, and categories. Manual data entry is time-consuming, error-prone, and doesn't scale. What if you could automate 90% of this process using AI?

The Solution: ViscribeAI

ViscribeAI uses one schema-driven extraction endpoint to create a complete product cataloging solution:

Structured Fields: Pull product specs, prices, and features from images
Listing Copy: Generate concise product summaries inside your schema
Category Data: Route products into your taxonomy with confidence notes
Quality Signals: Capture visible evidence and review flags in the same response

Implementation: Step-by-Step Guide

1. Installation and Setup

First, install the ViscribeAI Python library:

bash

pip install viscribe

Set OPENAI_API_KEY in your environment, then import the extract helper:

python

from pydantic import BaseModel, Field
from viscribe.images import extract

2. Extract Product Information

Define the exact catalog shape you want back from the product image:

python

class ProductCatalogEntry(BaseModel):
    product_name: str = Field(description="Customer-facing product name")
    brand: str | None = Field(description="Visible product brand, if present")
    price: float | None = Field(description="Visible price in USD, if present")
    key_features: list[str] = Field(description="Visible product features")
    color: str | None = Field(description="Primary visible color")
    listing_summary: str = Field(description="Short product summary for a catalog page")
    tags: list[str] = Field(description="Search and merchandising tags")


result = extract(
    image_url="https://example.com/laptop.jpg",
    output_schema=ProductCatalogEntry,
    instruction="Extract catalog-ready product data. Use null for values that are not visible."
)

product = result.data
print(product.model_dump())
# Output: {
#     "product_name": "Dell XPS 15 Laptop",
#     "brand": "Dell",
#     "price": 1299.99,
#     "key_features": ["15.6 inch display", "Intel i7", "16GB RAM", "512GB SSD"],
#     "color": "Silver",
#     "listing_summary": "A sleek silver laptop for professional productivity.",
#     "tags": ["laptop", "electronics", "silver", "ultrabook"]
# }

3. Add Category Routing

Add taxonomy and merchandising fields to the same extraction call:

python

class CategoryDecision(BaseModel):
    category_path: str = Field(description="Best matching category path from the allowed taxonomy")
    confidence: float = Field(description="Confidence score from 0 to 1")
    evidence: list[str] = Field(description="Visible clues that justify the category")
    needs_review: bool = Field(description="True when the image is ambiguous")


result = extract(
    image_url="https://example.com/laptop.jpg",
    output_schema=CategoryDecision,
    instruction=(
        "Choose exactly one category from: Electronics > Computers > Laptops, "
        "Electronics > Tablets, Electronics > Accessories, Office Supplies, Gaming Equipment."
    )
)

print(result.data.category_path)
# "Electronics > Computers > Laptops"
print(result.data.confidence)
# 0.97

4. Capture Review Signals

For production systems, include fields that tell downstream tools when a human should review the item:

python

class ProductReview(BaseModel):
    visible_defects: list[str] = Field(description="Visible damage or packaging issues")
    missing_information: list[str] = Field(description="Important catalog details not visible")
    needs_manual_review: bool = Field(description="True if the image is unclear or incomplete")
    reviewer_note: str = Field(description="Short note for the catalog operations team")


result = extract(
    image_url="https://example.com/product.jpg",
    output_schema=ProductReview,
    instruction="Inspect the product image for catalog quality issues before publishing."
)

print(result.data.reviewer_note)

Complete Workflow Example

Here's a complete function that processes a product image and returns all catalog information:

python

from pydantic import BaseModel, Field
from viscribe.images import extract

class CatalogEntry(BaseModel):
    name: str
    brand: str | None
    category_path: str
    features: list[str]
    description: str
    tags: list[str]
    visible_quality_issues: list[str]
    confidence: float
    needs_review: bool


def catalog_product(image_url: str) -> dict:
    result = extract(
        image_url=image_url,
        output_schema=CatalogEntry,
        instruction=(
            "Create a publishable catalog record. Choose a category from the store taxonomy, "
            "write concise listing copy, and flag uncertainty for manual review."
        )
    )

    return result.data.model_dump()


catalog_entry = catalog_product("https://example.com/product.jpg")
print(catalog_entry)

Real-World Use Cases

1. Marketplace Sellers

Automatically catalog products from supplier images when importing inventory. This reduces onboarding time from hours to minutes per product.

2. Dropshipping Businesses

Extract and rewrite product information from manufacturer websites to create unique listings that avoid duplicate content issues.

3. Inventory Management Systems

Process images of warehouse items to maintain accurate inventory records with proper categorization and descriptions.

4. Product Research Sites

Extract specifications from multiple product images to build comparison tables automatically.

Advanced: Handling Complex Products

For products with complex nested data, pass a richer Pydantic model as the output schema:

python

from pydantic import BaseModel, Field
from typing import List, Optional
from viscribe.images import extract

class Specification(BaseModel):
    processor: str
    ram: str
    storage: str
    display: str
    graphics: Optional[str]

class Product(BaseModel):
    name: str
    brand: str
    price: Optional[float]
    specifications: Specification
    warranty_years: Optional[int]
    evidence: List[str] = Field(description="Visible clues used to fill the record")

result = extract(
    image_url="https://example.com/laptop-specs.jpg",
    output_schema=Product,
    instruction="Extract a typed laptop catalog record from the visible image and spec panel."
)

product = result.data
print(f"Processor: {product.specifications.processor}")
print(f"RAM: {product.specifications.ram}")

Performance Tips

Batch Processing: Use async client for processing multiple images concurrently
Caching: Store results to avoid re-processing the same images
Error Handling: Implement retry logic for failed requests
Rate Limiting: Respect API limits and implement exponential backoff

Async Processing for Scale

When processing thousands of products, use aextract for better throughput:

python

import asyncio
from pydantic import BaseModel
from viscribe.images import aextract

class ProductSummary(BaseModel):
    name: str
    description: str
    tags: list[str]

async def process_products(image_urls):
    tasks = [
        aextract(
            image_url=url,
            output_schema=ProductSummary,
            instruction="Extract a concise catalog summary and searchable tags."
        )
        for url in image_urls
    ]

    results = await asyncio.gather(*tasks)
    return [result.data.model_dump() for result in results]

# Process 100 products in parallel
image_urls = ["https://example.com/product1.jpg", ...]
results = asyncio.run(process_products(image_urls))

Measuring Success

Companies using ViscribeAI for product cataloging typically see:

80-90% reduction in manual data entry time
95%+ accuracy in extracted data
Consistent quality across thousands of products
Faster time-to-market for new products

Start Building

Ready to automate your product cataloging? Install the library, choose the model provider you already use, and explore the full documentation for examples and best practices. You can also tell us what you are building through the community form.

Automate E-commerce Product Cataloging with ViscribeAI