Mar 2023·20 min·Graph RAG SystemBuilding

meal metrics ai

grandma's recipes don't come with nutrition labels. an api that structures any recipe and calculates macros.

the origin

it started with massimo bottura's tortellini in crema di parmigiano.

i love cooking. not meal prep—cooking. the kind where you spend three hours on a single dish because the process matters as much as the result. but i also track everything i eat. not out of restriction—out of curiosity. longevity isn't about deprivation. it's about understanding what you're putting into your body.

so there i was. this beautiful dish—hand-folded tortellini, aged parmigiano cream, a hint of nutmeg. i wanted to log it.

and i hit a wall.

the recipe said "a generous handful of parmigiano." how much is generous? 30 grams? 80? "reduce the cream until it coats the spoon"—how much cream is left after reduction? the tortellini filling was mortadella, prosciutto, pork loin in unspecified ratios.

i spent an hour reverse-engineering the macros of a dish i'd just spent three hours perfecting.

the irony: i could execute a michelin-level recipe, but couldn't tell you its protein content within 20 grams.

this isn't a one-time problem. it happens every day, to anyone who refuses to choose between eating well and eating informed.

the bigger picture

that frustration sent me down a rabbit hole. and what i found was worse than i expected.

the numbers:

510 million people follow #fitness on Instagram
billions of recipe videos across YouTube, TikTok, Instagram
400,000+ foods in the USDA database
zero standardized way to connect any of it

every recipe exists in isolation. every food database exists in isolation. the knowledge is there. it's just trapped.

look at what recipe creators actually write:

what they write	what it means	calorie variance
"a splash of olive oil"	5ml or 30ml?	0-240 kcal
"season generously"	how much salt?	unknown
"reduce until thick"	how much is left?	30-50% concentration
"add garlic to taste"	1 clove or 5?	minimal, but...

and it gets worse.

cooking methods change everything. research in the Korean Journal for Food Science of Animal Resources found that boiling chicken reduces protein by up to 23% compared to roasting. the protein literally leaches into the water.

one method choice. double-digit macro difference.

this isn't just my problem. athletes optimizing performance. people managing chronic conditions. anyone trying to understand what they're actually eating.

the infrastructure doesn't exist.

the insight

design background. i learned to see systems before screens.

and nutrition, i realized, isn't a database problem. it's a graph problem.

traditional calorie trackers think: "100g chicken breast = 31g protein." done.

but that's not how food works. look at what actually happens with bottura's tortellini:

element	what a database sees	what actually matters
parmigiano	392 kcal/100g	age affects moisture → weight → calories
cream	340 kcal/100ml	reduction concentrates fat → volume unknown
pork filling	sum of parts	mortadella:prosciutto ratio changes protein density
cooking	ignored	pasta absorbs water → weight changes → serving math breaks

the USDA tells me 100g parmigiano-reggiano contains 35.8g protein.

what it can't tell me: how much parmigiano ends up in my dish when the recipe says "generous handful."

a database stores facts in isolation. a graph stores relationships—ingredients, preparations, cooking methods, final nutritional outcomes.

the difference between "what is chicken breast?" and "what happens to chicken breast when i roast it versus boil it, and how does that change if i marinate it first?"

this is where meal metrics ai diverges from every calorie counter ever built.

not another lookup tool. a knowledge graph that understands food the way a nutritionist does—through relationships, context, inference.

the problem isn't missing data. it's missing connections.

the science

before building, i needed to understand how food knowledge is formally structured. and why existing approaches fail.

food ontologies

researchers have tried to formalize food knowledge into machine-readable structures:

FoodOn — the most comprehensive open-source food ontology. vocabulary for nutritional analysis, chemical components, agricultural practices, food safety. supports FAIR data annotation across academia, government, commercial.

FoodKG — unifies multiple ontologies with recipe instances from Recipe1M+ and USDA nutrient records. published at ISWC 2019. demonstrates how knowledge graphs can support recipe recommendations, ingredient substitutions, nutritional Q&A.

ONS (Ontology for Nutritional Studies) — integrates terminology across dietary research. FOBI (Food-Biomarker Ontology) — links foods to metabolites.

the gap none of them address: real-time extraction from unstructured content.

they assume structured input. the world produces unstructured content.

the chemistry of cooking

understanding nutrient retention means understanding what happens at the molecular level.

protein denaturation — begins at 40°C. denatured proteins may be more digestible (unfolded structures = more surface area for enzymes) but can cross-link under prolonged heat, reducing bioavailability.

maillard reaction — browning above 140°C creates new compounds from amino acids and reducing sugars. great for flavor. but those amino acids are "lost" nutritionally.

water-soluble nutrient leaching — explains why boiled chicken has less protein than roasted. proteins denature and dissolve into cooking liquid. if you're not drinking the broth, you're losing nutrients. research quantified this: boiled chicken breast retains only 77% of protein vs 100% for roasted.

fat-soluble vitamin retention — opposite pattern. cooking with fat increases absorption of vitamins A, D, E, K. sautéed vegetables may deliver more bioavailable nutrients than raw.

this is why "100g chicken = 31g protein" fails. the cooking method is part of the equation.

named entity recognition for recipes

extracting structured data from recipe text is a specific NLP challenge.

BiLSTM-CRF architectures — combine bidirectional LSTMs (past and future context) with Conditional Random Fields (enforce valid label sequences). the RNE method achieves F1 scores of 96.09% on ingredient extraction.

BERT variants — DistilBERT, RoBERTa fine-tuned for recipe NER. best models hit F1 ≥ 0.95 on standard datasets.

but these models train on clean, well-formatted recipes.

real-world content—instagram captions, handwritten notes, video transcriptions—is messier. performance degrades significantly.

the gap between academic benchmarks and production accuracy? that's the research challenge.

the hard problems

building a system that extracts accurate nutritional data from any recipe format. these are the research questions:

1. ambiguous measurements — how do you convert "a handful" to grams? a bodybuilder's handful isn't a home cook's handful. "a cup" means 240ml in america, 250ml in metric countries.

2. context-dependent ingredients — "protein powder" could be whey isolate (25g protein/30g scoop), plant blend (18g/30g), or collagen (10g/30g). same words, different products. how do you infer the right one?

3. cooking method effects — 200+ cooking methods. published research covers the big ones (roasting, boiling, steaming) for common proteins. but sous vide? pressure cooking? fermentation? the long tail lacks data.

4. format normalization — recipes appear as videos, blog posts, instagram captions, cookbook scans, handwritten notes. how do you extract structured ingredients from all of this?

5. ontological ambiguity — is tofu a protein or a vegetable? both. depends on your dietary framework. the system needs to represent ambiguity, not force premature resolution.

these are open questions. the optimal solutions aren't known. that's what makes this research.

the architecture

three-stage pipeline. arbitrary recipe content → accurate nutritional intelligence.

content ingestion — URLs, screenshots, text, video → LLM parsing, OCR, audio transcription
ingredient resolution — "a handful of spinach" → "30g raw spinach" with confidence scoring
graph query — Neo4j traversal across 50,000+ ingredients and 200+ cooking methods

output: complete macro/micronutrient breakdown with confidence scores and bioavailability estimates.

stage 1: content ingestion

extracting structured ingredient lists from unstructured content. the inputs:

text — blog posts, cookbook entries, social media captions. claude haiku for fast, cost-efficient parsing.

images — handwritten notes, cookbook scans, infographic recipes. OCR preprocessing → LLM extraction.

video — youtube, instagram reels, tiktok. whisper for audio transcription, visual analysis for on-screen text.

platform quirks — each source has unique conventions. normalize to common intermediate representation.

stage 2: ingredient resolution

the hardest problem isn't storing nutritional data. it's understanding what people mean.

"a handful of spinach":

volume variance — hand size varies. could be 20-50g.
preparation state — raw or cooked? packed or loose?
context inference — smoothie = raw and packed. pasta sauce = wilted (more by volume).

multi-stage resolution:

input	process	output	confidence
"a handful of spinach"	hand volume → 30g, context → smoothie → packed	30g raw spinach	0.87
"splash of olive oil"	corpus analysis → 10-20ml, salad → 15ml	15ml extra virgin	0.72
"medium onion, diced"	size standards → 100-120g, "medium" → 110g	110g yellow onion	0.94
"protein powder"	context ambiguous, default common	30g whey isolate	0.61

the system learns measurement conventions across cuisines. "a cup" = 240ml american, 250ml metric. "chopped garlic" implies volume. "garlic cloves" implies count.

current performance: 84% exact match on common ingredients.

the remaining 16% clusters in:

ambiguous brand references (34%): "protein powder" without spec
regional terminology (28%): "aubergine" vs "eggplant"
novel ingredients (22%): "cauliflower rice," "zoodles"
unclear prep states (16%): "cooked chicken" (boiled? roasted?)

stage 3: the knowledge graph

this is where meal metrics diverges from every nutritional database.

schema: 12 node types, 23 relationship types:

ingredients → base nutritional profiles (USDA)
cooking methods → nutrient retention coefficients (peer-reviewed)
substitutions → equivalent mappings with nutritional deltas
preparation states → raw, cooked, reduced, fermented
recipe context → cuisine, meal category, dietary framework

example query: "if i roast instead of boil this chicken, how do macros change?"

MATCH (chicken:Ingredient {name: "chicken breast"})-[:COOKED_BY]->(boiling:Method)
MATCH (chicken)-[:COOKED_BY]->(roasting:Method)
RETURN
  boiling.protein_retention AS boiled_protein,    // 77%
  roasting.protein_retention AS roasted_protein,  // 100%
  (roasted_protein - boiled_protein) AS delta     // +23%

not a guess. calculated from published retention coefficients:

method	breast	wing	leg
roasting	100%	94%	100%
steaming	98%	95%	96%
pan-frying	95%	89%	93%
boiling	77%	83%	77%

counterintuitive: roasted chicken retains more protein than boiled.

the conventional wisdom that boiling is "healthier" ignores protein leaching into cooking liquid. if you're not drinking the broth, you're losing 23% of the protein you think you're eating.

the graph encodes this at scale. 200+ cooking methods. 50,000+ ingredients.

the stack

component	technology	why
graph database	Neo4j	native relationship queries, cypher, scales to millions
primary data	USDA FoodData Central	400k+ foods, peer-reviewed, government-maintained
supplementary data	branded food APIs	commercial products, regional items
embeddings	custom fine-tuned	domain-specific food terminology
LLM	Claude Haiku 4.5	fast parsing, structured output, cost-efficient
vector store	Pinecone	semantic search, ingredient fuzzy matching
validation	Pydantic	runtime type checking, confidence enforcement
API	FastAPI	REST interface, OpenAPI docs

why claude with structured outputs

LLM choice matters. recipe parsing needs consistently structured output. not just understanding language—producing data downstream systems can process without error handling for every edge case.

anthropic's structured outputs (december 2025) solve this:

1. guaranteed schema compliance — constrained decoding restricts token generation to valid JSON matching your schema. no parsing errors. no retry logic. no validation failures.

2. food safety — recipe content can contain dangerous advice ("add raw chicken to smoothie"). claude's constitutional AI training embeds safety constraints, reducing harmful misinformation propagation.

3. multi-model cost optimization:

task	model	why
simple parsing	haiku 4.5	fast (<500ms), cheap at volume
complex interpretation	sonnet 4.5	better reasoning for ambiguous measurements
edge cases	sonnet 4.5	max accuracy for novel ingredients
real-time API	haiku 4.5	low latency

4. native pydantic — SDK transforms pydantic models directly to JSON schemas.

in practice

from anthropic import Anthropic
from pydantic import BaseModel

class Ingredient(BaseModel):
    name: str
    quantity: float
    unit: str
    preparation: str | None
    confidence: float

class RecipeExtraction(BaseModel):
    ingredients: list[Ingredient]
    cooking_method: str
    estimated_servings: int

client = Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": """Extract ingredients from this recipe:

        "Toss a generous handful of spinach with a splash of olive oil,
        a squeeze of lemon, and season generously with salt and pepper.
        Serves 2 as a side."
        """
    }],
    output_format={
        "type": "json_schema",
        "json_schema": {
            "name": "recipe_extraction",
            "strict": True,
            "schema": RecipeExtraction.model_json_schema()
        }
    }
)

response is guaranteed to match:

{
  "ingredients": [
    {"name": "spinach", "quantity": 30, "unit": "g",
     "preparation": "raw", "confidence": 0.85},
    {"name": "olive oil", "quantity": 15, "unit": "ml",
     "preparation": null, "confidence": 0.72},
    {"name": "lemon juice", "quantity": 10, "unit": "ml",
     "preparation": "fresh squeezed", "confidence": 0.78},
    {"name": "salt", "quantity": 2, "unit": "g",
     "preparation": null, "confidence": 0.65},
    {"name": "black pepper", "quantity": 1, "unit": "g",
     "preparation": "ground", "confidence": 0.65}
  ],
  "cooking_method": "raw",
  "estimated_servings": 2
}

note the confidence scores. "generous handful" → 30g at 0.85. "season generously" → 2g salt at 0.65. the system is honest about uncertainty.

this eliminated 94% of extraction failures in testing.

where it stands

iterating in public. system is partially built. the challenges are defining the research agenda.

what's working

graph schema: 12 node types, 23 relationship types
USDA integration: 400k+ foods across five data types
ingredient parsing: 84% accuracy on common ingredients
proof-of-concept semantic search for recipe similarity

what's hard

1. measurement ambiguity — 84% accuracy hits a wall. the remaining 16% requires context. "add garlic to taste" — is that 1 clove or 5? depends on recipe type, cuisine, other ingredients. needs a probabilistic inference layer on top of the graph.

2. cooking method coverage — published research covers roasting, boiling, steaming for chicken, beef, fish. but sous vide? pressure cooking? air frying? fermentation? the long tail lacks systematic data. collecting research. gaps remain.

3. ontology — the hardest part isn't tech. it's deciding how to categorize food. is tofu a protein or vegetable? depends on dietary framework. vegan athlete = primary protein. bodybuilder = incomplete protein with low bioavailability. the graph needs to represent ambiguity, not force premature resolution.

4. multi-language — english only right now. german, spanish, italian have rich culinary traditions. "quark" in german isn't "quark" in english—it's fresh cheese with no direct equivalent. not translation problems. cultural mapping problems.

5. latency — target is under 3 seconds for recipe analysis. current bottleneck: LLM extraction (1.5-2s). investigating model distillation, speculative decoding, cached embeddings for common ingredients.

the numbers

current accuracy: 84% field-level on core fields (ingredient, quantity, unit).

error distribution:

ambiguous brand references: 34%
regional terminology: 28%
novel/fusion ingredients: 22%
unclear prep states: 16%

target: 95%+. the gap requires better context models and more training data.

why this matters

i built the first version for myself. wanted to track bottura's tortellini without an hour of math. wanted to optimize longevity without giving up food i actually enjoy.

but somewhere in the process, i realized: this isn't a personal tool. it's infrastructure.

the knowledge exists:

USDA: 400k+ foods, peer-reviewed, constantly updated
branded databases: 350k+ commercial products
academic research: cooking method effects on bioavailability

it's just completely disconnected from where people actually find recipes. youtube. instagram. tiktok. blogs. cookbooks.

meal metrics ai is the missing layer.

not another calorie-counting app. those exist. they mostly fail because manual entry sucks.

this is the intelligence layer connecting unstructured recipe content to structured nutritional science.

an API that takes a bottura video as input and returns scientifically-grounded nutritional data as output.

that changes everything.

imagine:

recipe apps that auto-calculate macros for any recipe you save
meal planning that optimizes for protein while respecting what you actually want to eat
fitness apps with real food intake, not rough estimates
health platforms connecting diet to biomarkers with precision

this is what becomes possible when the knowledge graph exists.

limitations

technical

1. accuracy ceiling — 84% means 16% of estimates contain errors. for casual tracking, probably fine. for medical nutrition therapy—diabetes, renal diets, PKU—potentially dangerous. the system must communicate uncertainty. never present estimates as clinical-grade.

2. language scope — english only. german, spanish, italian need culturally-specific training. "quark" in german ≠ "quark" in english. not translation. cultural mapping.

3. cooking method gaps — published research covers the big methods for common proteins. the long tail—sous vide, pressure cooking, air frying, fermentation—lacks systematic data.

4. individual variation — population averages. real absorption varies by gut microbiome, genetics, medications, concurrent food intake. someone with celiac absorbs the same meal differently.

5. no clinical validation — not validated in medical settings. don't use for medical decisions without professional oversight.

ethical

eating disorder risk — research found 73% of MyFitnessPal users with eating disorders said the app contributed to their condition. calorie tracking correlates with eating concern and dietary restraint.

the tension: precise tracking enables informed decisions for many. but can trigger disordered patterns in vulnerable populations.

the system must:

no gamification (streaks, achievements, red/green judgments)
never frame eating as "good" or "bad" based on numbers
clear off-ramps for concerning patterns
opt-in to hide specific metrics

data privacy — recipe history reveals sensitive info. dietary restrictions → health conditions. ingredient patterns → religious practices. meal timing → work schedules. this data needs health-record-level protection.

creator attribution — extracting structured data from recipes raises IP questions. facts (ingredients, quantities) aren't copyrightable. creative expression may be. the system extracts factual info without reproducing creative content. boundary deserves attention.

algorithmic guidance — any system influencing what people eat carries responsibility. we explicitly avoid:

medical or disease-related claims
restrictive diet recommendations without professional context
optimizing single metrics at expense of balance
creating dependency on tracking

what's next

now

multi-language — german and spanish first. culturally-native annotation teams.
cooking method coverage — partnering with food science researchers. sous vide, air frying, pressure cooking, fermentation.
active learning — auto-identify low-confidence extractions for human review. continuously improve edge cases.
confidence calibration — make sure stated confidence matches actual accuracy.

later

public API — tiered pricing for recipe apps, meal planners, fitness platforms.
platform integrations — youtube, instagram, tiktok for seamless recipe import.
mobile SDK — on-device extraction for privacy-sensitive use cases.
barcode scanning — extend beyond recipes to packaged food.

vision

real-time video analysis — extract ingredients as cooking videos play. don't wait for completion.
biomarker correlation — connect meals to CGMs, sleep trackers, wearables. build personalized nutrition response models.
recipe modification — "swap heavy cream for greek yogurt to hit your protein target while reducing saturated fat." actionable suggestions that respect the dish.
proactive insights — pattern recognition across meal history. "your energy dips when lunch is low in protein." "you consistently underestimate pasta portions."

the point

nutritional data fragmentation isn't a technical limitation. it's an architecture failure.

we have:

comprehensive databases (USDA)
robust bioavailability research
billions of recipe videos

they just don't talk to each other.

meal metrics ai builds intelligence that adapts to content as it exists. not waiting for the world to standardize. a graph RAG system that understands food through relationships—ingredient → preparation → cooking method → final nutritional outcome.

current state: 84% accuracy. clear path to 95%+.

research ahead: measurement ambiguity. cooking method coverage. ontological flexibility. multi-language support.

the impact goes beyond personal convenience. a universal layer connecting recipe content to nutritional science enables:

AI nutrition coaching that works with what you want to eat
meal planning respecting culinary tradition while optimizing health
diet tracking that captures reality, not rough estimates

for anyone who's ever tried to track a recipe they actually enjoyed cooking—this should have existed ten years ago.

i'm building it now.

references

USDA Agricultural Research Service. (2024). FoodData Central. U.S. Department of Agriculture. https://fdc.nal.usda.gov/
Oz, F., Aksu, M.I., & Turan, M. (2017). A Comparison of the Essential Amino Acid Content and the Retention Rate by Chicken Part according to Different Cooking Methods. Korean Journal for Food Science of Animal Resources, 37(5), 739-749.
Deng, Y. et al. (2022). Applications of knowledge graphs for food science and industry. Patterns, 3(5), 100484.
Haussmann, S. et al. (2019). FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation. International Semantic Web Conference (ISWC).
FoodOn Consortium. (2024). FoodOn: A farm to fork ontology. https://foodon.org/
Anthropic. (2025). Structured Outputs Documentation. https://docs.anthropic.com/en/docs/build-with-claude/structured-outputs
Anthropic. (2025). Claude Sonnet 4.5 Model Card. San Francisco: Anthropic.
Radford, A. et al. (2023). Robust Speech Recognition via Large-Scale Weak Supervision. Proceedings of the 40th International Conference on Machine Learning (ICML).
Popovski, G. et al. (2024). Deep Learning Based Named Entity Recognition Models for Recipes. Proceedings of the 2024 Joint International Conference on Computational Linguistics (LREC-COLING).
Chia, Y.K. et al. (2022). Enhancing Food Ingredient Named-Entity Recognition with Recurrent Network-Based Ensemble (RNE) Model. Applied Sciences, 12(20), 10310.
Palermo, M. et al. (2014). A review of the impact of preparation and cooking on the nutritional quality of vegetables and legumes. International Journal of Gastronomy and Food Science, 3, 2-11.
Rinaldi, M. et al. (2022). Cooking at home to retain nutritional quality and minimise nutrient losses. Trends in Food Science & Technology, 126, 227-241.
Neo4j, Inc. (2024). Neo4j Graph Database Documentation. https://neo4j.com/docs/
Pinecone Systems, Inc. (2024). Pinecone Vector Database Documentation. https://docs.pinecone.io/
Levinson, C.A. et al. (2017). My Fitness Pal Calorie Tracker Usage in the Eating Disorders. Eating Behaviors, 27, 14-16.
Simpson, C.C. & Mazzeo, S.E. (2017). Calorie counting and fitness tracking technology: Associations with eating disorder symptomatology. Eating Behaviors, 26, 89-92.
Linardon, J. & Messer, M. (2019). My fitness pal usage in men: Associations with eating disorder symptoms and psychosocial impairment. Eating Behaviors, 33, 13-17.
Murphy, E.W., Criner, P.E., & Gray, B.C. (1975). Comparisons of methods for calculating retentions of nutrients in cooked foods. Journal of Agricultural and Food Chemistry, 23(6), 1153-1157.
Tian, J. et al. (2024). Core reference ontology for individualized exercise prescription (EXMO). Scientific Data, 11, 1319.
Transparency-One. (2024). Tracing the World's Food Supply from Farm to Fork with Neo4j. Neo4j Case Study.
OpenAI. (2024). Whisper Large v3 Model Card. Hugging Face.
Gupta, S. et al. (2024). Building FKG.in: A Knowledge Graph for Indian Food. Formal Ontology in Information Systems (FOIS 2024). arXiv:2409.00830.
Bai, Y. et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073. Anthropic.
Wang, W. et al. (2024). Nutrition-Related Knowledge Graph Neural Network for Food Recommendation. Foods, 13(13), 2111.

last updated: Dec 2025