Back to Blog

Automate E-commerce Product Cataloging with ViscribeAI

June 8, 2026
ViscribeAI Team
Tutorial
E-commerce product cataloging automation

Learn how to build an automated product cataloging system that extracts product attributes, listing copy, and category data from images using ViscribeAI's Python library.

The Challenge: Manual Product Cataloging

E-commerce businesses face a constant challenge: efficiently cataloging thousands of products with accurate descriptions, specifications, and categories. Manual data entry is time-consuming, error-prone, and doesn't scale. What if you could automate 90% of this process using AI?

The Solution: ViscribeAI

ViscribeAI uses one schema-driven extraction endpoint to create a complete product cataloging solution:

  • Structured Fields: Pull product specs, prices, and features from images
  • Listing Copy: Generate concise product summaries inside your schema
  • Category Data: Route products into your taxonomy with confidence notes
  • Quality Signals: Capture visible evidence and review flags in the same response

Implementation: Step-by-Step Guide

1. Installation and Setup

First, install the ViscribeAI Python library:

bash
pip install viscribe

Set OPENAI_API_KEY in your environment, then import the extract helper:

python
from pydantic import BaseModel, Field
from viscribe.images import extract

2. Extract Product Information

Define the exact catalog shape you want back from the product image:

python
class ProductCatalogEntry(BaseModel):
    product_name: str = Field(description="Customer-facing product name")
    brand: str | None = Field(description="Visible product brand, if present")
    price: float | None = Field(description="Visible price in USD, if present")
    key_features: list[str] = Field(description="Visible product features")
    color: str | None = Field(description="Primary visible color")
    listing_summary: str = Field(description="Short product summary for a catalog page")
    tags: list[str] = Field(description="Search and merchandising tags")


result = extract(
    image_url="https://example.com/laptop.jpg",
    output_schema=ProductCatalogEntry,
    instruction="Extract catalog-ready product data. Use null for values that are not visible."
)

product = result.data
print(product.model_dump())
# Output: {
#     "product_name": "Dell XPS 15 Laptop",
#     "brand": "Dell",
#     "price": 1299.99,
#     "key_features": ["15.6 inch display", "Intel i7", "16GB RAM", "512GB SSD"],
#     "color": "Silver",
#     "listing_summary": "A sleek silver laptop for professional productivity.",
#     "tags": ["laptop", "electronics", "silver", "ultrabook"]
# }

3. Add Category Routing

Add taxonomy and merchandising fields to the same extraction call:

python
class CategoryDecision(BaseModel):
    category_path: str = Field(description="Best matching category path from the allowed taxonomy")
    confidence: float = Field(description="Confidence score from 0 to 1")
    evidence: list[str] = Field(description="Visible clues that justify the category")
    needs_review: bool = Field(description="True when the image is ambiguous")


result = extract(
    image_url="https://example.com/laptop.jpg",
    output_schema=CategoryDecision,
    instruction=(
        "Choose exactly one category from: Electronics > Computers > Laptops, "
        "Electronics > Tablets, Electronics > Accessories, Office Supplies, Gaming Equipment."
    )
)

print(result.data.category_path)
# "Electronics > Computers > Laptops"
print(result.data.confidence)
# 0.97

4. Capture Review Signals

For production systems, include fields that tell downstream tools when a human should review the item:

python
class ProductReview(BaseModel):
    visible_defects: list[str] = Field(description="Visible damage or packaging issues")
    missing_information: list[str] = Field(description="Important catalog details not visible")
    needs_manual_review: bool = Field(description="True if the image is unclear or incomplete")
    reviewer_note: str = Field(description="Short note for the catalog operations team")


result = extract(
    image_url="https://example.com/product.jpg",
    output_schema=ProductReview,
    instruction="Inspect the product image for catalog quality issues before publishing."
)

print(result.data.reviewer_note)

Complete Workflow Example

Here's a complete function that processes a product image and returns all catalog information:

python
from pydantic import BaseModel, Field
from viscribe.images import extract

class CatalogEntry(BaseModel):
    name: str
    brand: str | None
    category_path: str
    features: list[str]
    description: str
    tags: list[str]
    visible_quality_issues: list[str]
    confidence: float
    needs_review: bool


def catalog_product(image_url: str) -> dict:
    result = extract(
        image_url=image_url,
        output_schema=CatalogEntry,
        instruction=(
            "Create a publishable catalog record. Choose a category from the store taxonomy, "
            "write concise listing copy, and flag uncertainty for manual review."
        )
    )

    return result.data.model_dump()


catalog_entry = catalog_product("https://example.com/product.jpg")
print(catalog_entry)

Real-World Use Cases

1. Marketplace Sellers

Automatically catalog products from supplier images when importing inventory. This reduces onboarding time from hours to minutes per product.

2. Dropshipping Businesses

Extract and rewrite product information from manufacturer websites to create unique listings that avoid duplicate content issues.

3. Inventory Management Systems

Process images of warehouse items to maintain accurate inventory records with proper categorization and descriptions.

4. Product Research Sites

Extract specifications from multiple product images to build comparison tables automatically.

Advanced: Handling Complex Products

For products with complex nested data, pass a richer Pydantic model as the output schema:

python
from pydantic import BaseModel, Field
from typing import List, Optional
from viscribe.images import extract

class Specification(BaseModel):
    processor: str
    ram: str
    storage: str
    display: str
    graphics: Optional[str]

class Product(BaseModel):
    name: str
    brand: str
    price: Optional[float]
    specifications: Specification
    warranty_years: Optional[int]
    evidence: List[str] = Field(description="Visible clues used to fill the record")

result = extract(
    image_url="https://example.com/laptop-specs.jpg",
    output_schema=Product,
    instruction="Extract a typed laptop catalog record from the visible image and spec panel."
)

product = result.data
print(f"Processor: {product.specifications.processor}")
print(f"RAM: {product.specifications.ram}")

Performance Tips

  • Batch Processing: Use async client for processing multiple images concurrently
  • Caching: Store results to avoid re-processing the same images
  • Error Handling: Implement retry logic for failed requests
  • Rate Limiting: Respect API limits and implement exponential backoff

Async Processing for Scale

When processing thousands of products, use aextract for better throughput:

python
import asyncio
from pydantic import BaseModel
from viscribe.images import aextract

class ProductSummary(BaseModel):
    name: str
    description: str
    tags: list[str]

async def process_products(image_urls):
    tasks = [
        aextract(
            image_url=url,
            output_schema=ProductSummary,
            instruction="Extract a concise catalog summary and searchable tags."
        )
        for url in image_urls
    ]

    results = await asyncio.gather(*tasks)
    return [result.data.model_dump() for result in results]

# Process 100 products in parallel
image_urls = ["https://example.com/product1.jpg", ...]
results = asyncio.run(process_products(image_urls))

Measuring Success

Companies using ViscribeAI for product cataloging typically see:

  • 80-90% reduction in manual data entry time
  • 95%+ accuracy in extracted data
  • Consistent quality across thousands of products
  • Faster time-to-market for new products

Start Building

Ready to automate your product cataloging? Install the library, choose the model provider you already use, and explore the full documentation for examples and best practices. You can also tell us what you are building through the community form.

ViscribeAI

ViscribeAI Team

Open-source image framework

ViscribeAI is dedicated to advancing image-to-text technologies, making visual information accessible and actionable for developers and businesses worldwide through an open-source framework.

Related Articles

API integration
TutorialJune 10, 2026

Getting Started with ViscribeAI Python Library

A comprehensive guide to integrating ViscribeAI into your Python applications with practical examples and best practices.

Future of image technology
TechnologyJune 10, 2026

The Future of Image-to-Text Technology in 2026

Explore how AI-powered image-to-text conversion is transforming industries from healthcare to e-commerce.

Stay close to Viscribe

Join the community form on the homepage to share what you are building and get project updates.

Contact us