How to Build an AI-Powered Restaurant Recommender Like 'Vibe Code' Using LLMs and ClickHouse

UUnknown

2026-02-12

10 min read

End-to-end guide to building a social, AI-driven restaurant recommender using LLMs and ClickHouse—architecture, prompts, data model, and analytics.

Decision fatigue is real. Developers and product teams building social dining tools face a recurring problem: how to combine personal taste, real-time context (who's available, budget, distance), and explainable recommendations without shipping a heavyweight ML stack. In 2026, with lightweight LLMs, affordable embeddings, and ClickHouse's real-time OLAP + vector capabilities, you can build a micro app like Rebecca Yu’s Where2Eat — but production-ready, auditable, and scalable.

What you’ll get from this guide

End-to-end architecture for a social restaurant recommender using LLMs for natural language understanding and explanations, and ClickHouse for fast analytics and vector search.
Concrete data model and ClickHouse schemas you can copy-and-adapt.
Prompt templates and prompt engineering patterns for reliable suggestions and constrained reasoning.
Real-time pipelines, ranking strategy, and observability/analytics to measure success.
Security, privacy, and cost trade-offs for 2026 deployments.

Why build this now (2026 context)

By late 2025 and into 2026 the LLM and analytics landscape matured in ways that enable micro apps to be both cheap and powerful. Open and efficient models, multi-modal capabilities, and improvements in vector search mean you don't need a billion-dollar data warehouse to run personalized recommender features. ClickHouse's runway (notably its major funding and rapid product advances in 2025–26) pushed vector search and real-time aggregation features into mainstream infra — ideal for a small, latency-sensitive recommender app.

High-level architecture

Design principles: keep the user flow snappy, make recommendations explainable, and separate retrieval from generation for cost control. Here's a pragmatic architecture for a micro app:

Edge API / Inference Gateway: routes requests to the appropriate LLM or embedding model, enforces rate limits, and applies caching.
Retrieval Layer (ClickHouse): stores restaurant metadata, embeddings, interactions, and serves fast vector + filter queries to return candidate lists.
Reranker (LLM or lightweight model): takes the candidate set + chat context and produces a ranked list with short natural-language rationale.
Event Stream (Kafka/Pulsar): streams interactions back into ClickHouse for analytics and model-feedback loops.
Observability & Dashboard: Superset/Metabase or custom UI backed by ClickHouse for metrics like precision@k, latency, and conversion.

Why ClickHouse?

ClickHouse lets you combine OLAP-scale aggregation and real-time ingestion with efficient vector retrieval. That means a single primary datastore can answer both “which restaurants match this embedding?” and “how often did this suggestion lead to a reservation?” without expensive ETL. For a micro app you get speed and lower operational complexity.

Data model: what to store

Keep the model simple but query-friendly. You’ll need four core types of objects:

Restaurant catalog: static metadata (name, cuisine, price, hours, lat/lon, tags, menu links, verified cleanliness/safety flags).
Embeddings: dense vectors representing restaurant descriptions, menus, and images (if using multi-modal embeddings).
User & group profiles: preference signals and short text bios; for privacy, keep PII minimal and store hashed IDs.
Interactions/events: impressions, clicks, saves, RSVPs, and chat messages — streamed into ClickHouse for analytics and training signals.

Example ClickHouse schema

Below are compact table definitions you can adapt. These use ClickHouse's MergeTree engine for fast writes and queries. For embeddings we'll store as Array(Float32) and compute similarity using SQL array functions.

-- restaurants table
  CREATE TABLE restaurants (
    restaurant_id UInt64,
    name String,
    description String,
    cuisine Array(String),
    price_tier UInt8,
    lat Float64,
    lon Float64,
    tags Array(String),
    embedding Array(Float32),
    updated_at DateTime
  ) ENGINE = MergeTree()
  ORDER BY (restaurant_id);

  -- interactions/events (streamed)
  CREATE TABLE events (
    event_time DateTime,
    user_id UInt64,
    group_id UInt64,
    event_type String, -- impression/click/rsvp
    restaurant_id UInt64,
    metadata JSON
  ) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
  ORDER BY (event_time);

Note: some deployments use ClickHouse's newer vector column types; if yours supports it, switch embedding to the native vector type for speed. The Array(Float32) approach is portable and works well.

Retrieval + Reranking pipeline

Split the recommendation into two steps for both cost and responsiveness:

Retrieve (ClickHouse): run a hybrid query that uses vector similarity to get ~50 candidates, combined with deterministic filters (open now, within walking distance, fits budget).
Rerank (LLM / lightweight cross-encoder): feed the small candidate set plus the chat context to an LLM for personalization, explanation, and constraint handling (e.g., allergies).

Candidate retrieval SQL (cosine similarity)

If your ClickHouse doesn't provide a built-in cosine function, compute it with SQL array ops. This example returns the top 50 restaurants closest to the query embedding while filtering by price and distance.

SELECT
    restaurant_id,
    name,
    price_tier,
    tags,
    arraySum(arrayMap((x,y)->x*y, embedding, query_emb)) AS dot,
    sqrt(arraySum(arrayMap(x->x*x, embedding))) AS norm_r,
    sqrt(arraySum(arrayMap(x->x*x, query_emb))) AS norm_q,
    dot / (norm_r * norm_q) AS cosine
  FROM restaurants
  WHERE price_tier <= 2
    AND hypot(lat - {user_lat}, lon - {user_lon}) < {max_dist}
  ORDER BY cosine DESC
  LIMIT 50;

Run this query from the inference gateway after generating the query embedding with whichever embedding model you use. Keep embeddings normalized on insert to avoid repeated norm calculations.

Prompt engineering: templates that work

In 2026, robust prompt patterns focus on constrained generation, provenance, and token budget control. Use system-level instructions and give the LLM only the necessary context: short chat history, user preferences, and the pre-ranked candidate block.

Two-step prompt pattern

Rerank prompt — concise, deterministic instructions to sort candidates and produce a short explanation per item.
Explain/Share prompt — produce a human-facing message for the group chat, including 1-sentence rationale and confidence.

System: You are an assistant that ranks restaurants for a small group. Follow constraints exactly. Do not hallucinate menu items or ratings.

  User: Conversation: "Three of us — two vegans, one loves spicy food. Budget: $$. Open at 7pm. 1. Prefer somewhere walkable."

  Candidates:
  1) Name: Spice Garden; Tags: spicy, thai; Score: 0.97
  2) Name: Green Table; Tags: vegan, casual; Score: 0.88
  3) Name: Cafe North; Tags: coffee, pastries; Score: 0.65

  Task: Return a JSON array of top 3 with keys {id, name, final_rank, reason (15-25 words), confidence (0.0-1.0)}. Keep reasons factual (cite tags and distance if relevant). If candidate violates constraints (closed, not vegan options), exclude it.

Why this works: provide the LLM with vetted candidates (reduces hallucination) and force structured output for easy parsing.

Handling hard constraints and safety

Allergies & dietary rules: enforce in the retrieval filter first. If retrieval cannot guarantee, have the LLM flag uncertainty rather than inventing facts.
Open/closed hours: prefer deterministic checks against the restaurant metadata.
Content safety: apply content filters on chat input before sending to the LLM. Log user consent for storing chat if you plan to use it for training.

Real-time analytics and feedback loop

One of the advantages of building on ClickHouse is fast iterative analytics. Stream every event (impression, click, save, RSVP) into ClickHouse and define materialized views for product metrics:

CREATE MATERIALIZED VIEW mv_daily_metrics TO daily_metrics AS
  SELECT
    toDate(event_time) AS day,
    count(*) AS total_events,
    countIf(event_type='click') AS clicks,
    countIf(event_type='rsvp') AS rsvps,
    clicks / total_events AS ctr
  FROM events
  GROUP BY day;

Key metrics to track:

Precision@K — percentage of top K suggestions that lead to a click/reservation.
CTR and conversion — impression to click and click to RSVP rates.
Latency p95 — end-to-end recommendation latency (target <1s for micro apps; can accept longer for LLM reranking if UX is async).
Diversity — ensure repeated suggestions are not stale across sessions for the same group.

Performance, cost, and hybrid ranking

LLM calls are the most expensive part. Use a hybrid approach:

Use ClickHouse vector retrieval for the heavy lifting.
Apply a cheap learned or rule-based reranker for low-latency interactions (e.g., machine-learned linear model). Use the LLM only for final human-readable explanations or weekly personalization updates.
Cache results for repeated queries (same group + same context) to avoid recomputation.

Personalization & cold-start

For micro apps used by small friend groups, personalization must work with limited data:

Profile seeding: seed users with quick preference sliders (spicy, quiet, budget) at signup.
Group embeddings: compute a group embedding by averaging member embeddings plus the chat context embedding; use that as the query vector.
Cold start: fall back to popularity + recency + cuisine diversity when user data is sparse.

Observability & debugging

Make decisions auditable. Store the candidate list and the reranker input/output for every recommendation request (redact PII). This makes it possible to debug when the app suggests inappropriate options and iteratively improve prompts.

Privacy and compliance (practical rules)

Keep personally identifying data minimal and encrypted. Use hashed user IDs for analytics.
Offer an opt-out for using chat data for model improvement; store opt-outs in ClickHouse and respect them upstream.
Local vs cloud models: if sensitive data is involved, prefer self-hosted inference or deploy “no-log” contract models.

Example implementation checklist (step-by-step)

Gather a restaurant catalog (initial seed: OpenTable/Yelp APIs, manual CSV). Normalize schema to the restaurants table above.
Pick embedding model and generate embeddings for restaurants. Insert into ClickHouse (normalize vectors).
Build a simple client UI to capture group members and chat context.
Implement the inference gateway: generate query embedding, call ClickHouse retrieve query, call reranker LLM with top candidates, return JSON to client.
Stream interaction events to ClickHouse and build dashboards (CTR, precision@K, latency).
Iterate on prompts and rules; add caching and a cheap reranker to control costs.

Benchmarking & expected numbers (practical targets)

Use these as starting SLAs for a micro app in 2026:

Recommendation latency: p50 < 400ms (retrieval + cheap rerank), full LLM explanation p95 < 1.5s.
Cost per active recommendation: target <$0.02 using hybrid rerank & caching (highly model-dependent).
Precision@5: aim for 0.35–0.5 in early iterations; improve with feedback loop and personalization.

Future-proofing & 2026 trends

Trends to watch and adopt:

On-device LLMs: increasing feasibility for private, offline recommendations on mobile devices.
Multi-modal embeddings: combining menu text, photos, and short reviews into a joint embedding space for richer retrieval.
ClickHouse vector acceleration: continued improvements mean more of your retrieval and analytics can live in the same store.
Explainability as a feature: users prefer transparent reasons; LLMs make this cheap and natural.

Make the recommender auditable: store candidates, model inputs, and outputs so product and legal teams can explain any suggestion.

Common pitfalls and how to avoid them

Hallucinations: avoid passing raw catalog-free prompts to the LLM. Always feed vetted candidate data for generation.
Latency surprises: measure end-to-end and isolate LLM cost vs retrieval cost. Add a cheap reranker for quick responses.
Privacy leaks: never store raw chat transcripts without explicit consent; use hashing and redaction.
Overfitting to popularity: enforce diversity constraints in reranking and periodically surface niche options.

Actionable takeaways

Combine ClickHouse vector retrieval with an LLM reranker: retrieval for scale, LLM for nuance.
Use structured prompts and filtered candidate inputs to prevent hallucinations and keep generation deterministic.
Stream all events into ClickHouse and iterate quickly on metrics like precision@K, CTR, and latency.
Start small: a micro app with a simple data model gets you to usable recommendations fast; add personalization after you have event signals.

Next steps & call to action

If you want a jumpstart, clone a starter repo (backend + ClickHouse schema + prompt templates), seed it with a small catalog, and run a 48-hour experiment with a friend group. Measure CTR and watch the top suggestions evolve — you’ll be surprised how fast it improves with real interactions.

Ready to build? Start by defining your minimal catalog and wiring ClickHouse for retrieval. If you'd like, download our ClickHouse schema and prompt templates to get a working prototype in under a week — then iterate using the analytics patterns above.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Up Next

Advanced Itinerary Design for Tech Conferences — Reducing Decision Fatigue (2026 Playbook)

•10 min read

Storage Choices for ClickHouse and OLAP Workloads: NVMe, HDD, or Emerging PLC Flash?

•11 min read

Edge, Serverless and Latency: Evolving Developer Workflows for Interactive Apps in 2026

2026-02-15T05:25:06.231Z

Stop guessing where to eat: build a fast, social restaurant recommender that actually understands your group's vibe