meal metrics ai
grandma's recipes don't come with nutrition labels. an api that structures any recipe and calculates macros.
the origin
it started with massimo bottura's tortellini in crema di parmigiano.
i love cooking. not meal prep—cooking. the kind where you spend three hours on a single dish because the process matters as much as the result. but i also track everything i eat. not out of restriction—out of curiosity. longevity isn't about deprivation. it's about understanding what you're putting into your body.
so there i was. this beautiful dish—hand-folded tortellini, aged parmigiano cream, a hint of nutmeg. i wanted to log it.
and i hit a wall.
the recipe said "a generous handful of parmigiano." how much is generous? 30 grams? 80? "reduce the cream until it coats the spoon"—how much cream is left after reduction? the tortellini filling was mortadella, prosciutto, pork loin in unspecified ratios.
i spent an hour reverse-engineering the macros of a dish i'd just spent three hours perfecting.
the irony: i could execute a michelin-level recipe, but couldn't tell you its protein content within 20 grams.
this isn't a one-time problem. it happens every day, to anyone who refuses to choose between eating well and eating informed.
the bigger picture
that frustration sent me down a rabbit hole. and what i found was worse than i expected.
the numbers:
- 510 million people follow #fitness on Instagram
- billions of recipe videos across YouTube, TikTok, Instagram
- 400,000+ foods in the USDA database
- zero standardized way to connect any of it
every recipe exists in isolation. every food database exists in isolation. the knowledge is there. it's just trapped.
look at what recipe creators actually write:
| what they write | what it means | calorie variance |
|---|---|---|
| "a splash of olive oil" | 5ml or 30ml? | 0-240 kcal |
| "season generously" | how much salt? | unknown |
| "reduce until thick" | how much is left? | 30-50% concentration |
| "add garlic to taste" | 1 clove or 5? | minimal, but... |
and it gets worse.
cooking methods change everything. research in the Korean Journal for Food Science of Animal Resources found that boiling chicken reduces protein by up to 23% compared to roasting. the protein literally leaches into the water.
one method choice. double-digit macro difference.
this isn't just my problem. athletes optimizing performance. people managing chronic conditions. anyone trying to understand what they're actually eating.
the infrastructure doesn't exist.
the insight
design background. i learned to see systems before screens.
and nutrition, i realized, isn't a database problem. it's a graph problem.
traditional calorie trackers think: "100g chicken breast = 31g protein." done.
but that's not how food works. look at what actually happens with bottura's tortellini:
| element | what a database sees | what actually matters |
|---|---|---|
| parmigiano | 392 kcal/100g | age affects moisture → weight → calories |
| cream | 340 kcal/100ml | reduction concentrates fat → volume unknown |
| pork filling | sum of parts | mortadella:prosciutto ratio changes protein density |
| cooking | ignored | pasta absorbs water → weight changes → serving math breaks |
the USDA tells me 100g parmigiano-reggiano contains 35.8g protein.
what it can't tell me: how much parmigiano ends up in my dish when the recipe says "generous handful."
a database stores facts in isolation. a graph stores relationships—ingredients, preparations, cooking methods, final nutritional outcomes.
the difference between "what is chicken breast?" and "what happens to chicken breast when i roast it versus boil it, and how does that change if i marinate it first?"
this is where meal metrics ai diverges from every calorie counter ever built.
not another lookup tool. a knowledge graph that understands food the way a nutritionist does—through relationships, context, inference.
the problem isn't missing data. it's missing connections.
the science
before building, i needed to understand how food knowledge is formally structured. and why existing approaches fail.
food ontologies
researchers have tried to formalize food knowledge into machine-readable structures:
FoodOn — the most comprehensive open-source food ontology. vocabulary for nutritional analysis, chemical components, agricultural practices, food safety. supports FAIR data annotation across academia, government, commercial.
FoodKG — unifies multiple ontologies with recipe instances from Recipe1M+ and USDA nutrient records. published at ISWC 2019. demonstrates how knowledge graphs can support recipe recommendations, ingredient substitutions, nutritional Q&A.
ONS (Ontology for Nutritional Studies) — integrates terminology across dietary research. FOBI (Food-Biomarker Ontology) — links foods to metabolites.
the gap none of them address: real-time extraction from unstructured content.
they assume structured input. the world produces unstructured content.
the chemistry of cooking
understanding nutrient retention means understanding what happens at the molecular level.
protein denaturation — begins at 40°C. denatured proteins may be more digestible (unfolded structures = more surface area for enzymes) but can cross-link under prolonged heat, reducing bioavailability.
maillard reaction — browning above 140°C creates new compounds from amino acids and reducing sugars. great for flavor. but those amino acids are "lost" nutritionally.
water-soluble nutrient leaching — explains why boiled chicken has less protein than roasted. proteins denature and dissolve into cooking liquid. if you're not drinking the broth, you're losing nutrients. research quantified this: boiled chicken breast retains only 77% of protein vs 100% for roasted.
fat-soluble vitamin retention — opposite pattern. cooking with fat increases absorption of vitamins A, D, E, K. sautéed vegetables may deliver more bioavailable nutrients than raw.
this is why "100g chicken = 31g protein" fails. the cooking method is part of the equation.
named entity recognition for recipes
extracting structured data from recipe text is a specific NLP challenge.
BiLSTM-CRF architectures — combine bidirectional LSTMs (past and future context) with Conditional Random Fields (enforce valid label sequences). the RNE method achieves F1 scores of 96.09% on ingredient extraction.
BERT variants — DistilBERT, RoBERTa fine-tuned for recipe NER. best models hit F1 ≥ 0.95 on standard datasets.
but these models train on clean, well-formatted recipes.
real-world content—instagram captions, handwritten notes, video transcriptions—is messier. performance degrades significantly.
the gap between academic benchmarks and production accuracy? that's the research challenge.
the hard problems
building a system that extracts accurate nutritional data from any recipe format. these are the research questions:
1. ambiguous measurements — how do you convert "a handful" to grams? a bodybuilder's handful isn't a home cook's handful. "a cup" means 240ml in america, 250ml in metric countries.
2. context-dependent ingredients — "protein powder" could be whey isolate (25g protein/30g scoop), plant blend (18g/30g), or collagen (10g/30g). same words, different products. how do you infer the right one?
3. cooking method effects — 200+ cooking methods. published research covers the big ones (roasting, boiling, steaming) for common proteins. but sous vide? pressure cooking? fermentation? the long tail lacks data.
4. format normalization — recipes appear as videos, blog posts, instagram captions, cookbook scans, handwritten notes. how do you extract structured ingredients from all of this?
5. ontological ambiguity — is tofu a protein or a vegetable? both. depends on your dietary framework. the system needs to represent ambiguity, not force premature resolution.
these are open questions. the optimal solutions aren't known. that's what makes this research.
the architecture
three-stage pipeline. arbitrary recipe content → accurate nutritional intelligence.
- content ingestion — URLs, screenshots, text, video → LLM parsing, OCR, audio transcription
- ingredient resolution — "a handful of spinach" → "30g raw spinach" with confidence scoring
- graph query — Neo4j traversal across 50,000+ ingredients and 200+ cooking methods
output: complete macro/micronutrient breakdown with confidence scores and bioavailability estimates.
stage 1: content ingestion
extracting structured ingredient lists from unstructured content. the inputs:
text — blog posts, cookbook entries, social media captions. claude haiku for fast, cost-efficient parsing.
images — handwritten notes, cookbook scans, infographic recipes. OCR preprocessing → LLM extraction.
video — youtube, instagram reels, tiktok. whisper for audio transcription, visual analysis for on-screen text.
platform quirks — each source has unique conventions. normalize to common intermediate representation.
stage 2: ingredient resolution
the hardest problem isn't storing nutritional data. it's understanding what people mean.
"a handful of spinach":
- volume variance — hand size varies. could be 20-50g.
- preparation state — raw or cooked? packed or loose?
- context inference — smoothie = raw and packed. pasta sauce = wilted (more by volume).
multi-stage resolution:
| input | process | output | confidence |
|---|---|---|---|
| "a handful of spinach" | hand volume → 30g, context → smoothie → packed | 30g raw spinach | 0.87 |
| "splash of olive oil" | corpus analysis → 10-20ml, salad → 15ml | 15ml extra virgin | 0.72 |
| "medium onion, diced" | size standards → 100-120g, "medium" → 110g | 110g yellow onion | 0.94 |
| "protein powder" | context ambiguous, default common | 30g whey isolate | 0.61 |
the system learns measurement conventions across cuisines. "a cup" = 240ml american, 250ml metric. "chopped garlic" implies volume. "garlic cloves" implies count.
current performance: 84% exact match on common ingredients.
the remaining 16% clusters in:
- ambiguous brand references (34%): "protein powder" without spec
- regional terminology (28%): "aubergine" vs "eggplant"
- novel ingredients (22%): "cauliflower rice," "zoodles"
- unclear prep states (16%): "cooked chicken" (boiled? roasted?)
stage 3: the knowledge graph
this is where meal metrics diverges from every nutritional database.
schema: 12 node types, 23 relationship types:
- ingredients → base nutritional profiles (USDA)
- cooking methods → nutrient retention coefficients (peer-reviewed)
- substitutions → equivalent mappings with nutritional deltas
- preparation states → raw, cooked, reduced, fermented
- recipe context → cuisine, meal category, dietary framework
example query: "if i roast instead of boil this chicken, how do macros change?"
MATCH (chicken:Ingredient {name: "chicken breast"})-[:COOKED_BY]->(boiling:Method)
MATCH (chicken)-[:COOKED_BY]->(roasting:Method)
RETURN
boiling.protein_retention AS boiled_protein, // 77%
roasting.protein_retention AS roasted_protein, // 100%
(roasted_protein - boiled_protein) AS delta // +23%
not a guess. calculated from published retention coefficients:
| method | breast | wing | leg |
|---|---|---|---|
| roasting | 100% | 94% | 100% |
| steaming | 98% | 95% | 96% |
| pan-frying | 95% | 89% | 93% |
| boiling | 77% | 83% | 77% |
counterintuitive: roasted chicken retains more protein than boiled.
the conventional wisdom that boiling is "healthier" ignores protein leaching into cooking liquid. if you're not drinking the broth, you're losing 23% of the protein you think you're eating.
the graph encodes this at scale. 200+ cooking methods. 50,000+ ingredients.
the stack
| component | technology | why |
|---|---|---|
| graph database | Neo4j | native relationship queries, cypher, scales to millions |
| primary data | USDA FoodData Central | 400k+ foods, peer-reviewed, government-maintained |
| supplementary data | branded food APIs | commercial products, regional items |
| embeddings | custom fine-tuned | domain-specific food terminology |
| LLM | Claude Haiku 4.5 | fast parsing, structured output, cost-efficient |
| vector store | Pinecone | semantic search, ingredient fuzzy matching |
| validation | Pydantic | runtime type checking, confidence enforcement |
| API | FastAPI | REST interface, OpenAPI docs |
why claude with structured outputs
LLM choice matters. recipe parsing needs consistently structured output. not just understanding language—producing data downstream systems can process without error handling for every edge case.
anthropic's structured outputs (december 2025) solve this:
1. guaranteed schema compliance — constrained decoding restricts token generation to valid JSON matching your schema. no parsing errors. no retry logic. no validation failures.
2. food safety — recipe content can contain dangerous advice ("add raw chicken to smoothie"). claude's constitutional AI training embeds safety constraints, reducing harmful misinformation propagation.
3. multi-model cost optimization:
| task | model | why |
|---|---|---|
| simple parsing | haiku 4.5 | fast (<500ms), cheap at volume |
| complex interpretation | sonnet 4.5 | better reasoning for ambiguous measurements |
| edge cases | sonnet 4.5 | max accuracy for novel ingredients |
| real-time API | haiku 4.5 | low latency |
4. native pydantic — SDK transforms pydantic models directly to JSON schemas.
in practice
from anthropic import Anthropic
from pydantic import BaseModel
class Ingredient(BaseModel):
name: str
quantity: float
unit: str
preparation: str | None
confidence: float
class RecipeExtraction(BaseModel):
ingredients: list[Ingredient]
cooking_method: str
estimated_servings: int
client = Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": """Extract ingredients from this recipe:
"Toss a generous handful of spinach with a splash of olive oil,
a squeeze of lemon, and season generously with salt and pepper.
Serves 2 as a side."
"""
}],
output_format={
"type": "json_schema",
"json_schema": {
"name": "recipe_extraction",
"strict": True,
"schema": RecipeExtraction.model_json_schema()
}
}
)
response is guaranteed to match:
{
"ingredients": [
{"name": "spinach", "quantity": 30, "unit": "g",
"preparation": "raw", "confidence": 0.85},
{"name": "olive oil", "quantity": 15, "unit": "ml",
"preparation": null, "confidence": 0.72},
{"name": "lemon juice", "quantity": 10, "unit": "ml",
"preparation": "fresh squeezed", "confidence": 0.78},
{"name": "salt", "quantity": 2, "unit": "g",
"preparation": null, "confidence": 0.65},
{"name": "black pepper", "quantity": 1, "unit": "g",
"preparation": "ground", "confidence": 0.65}
],
"cooking_method": "raw",
"estimated_servings": 2
}
note the confidence scores. "generous handful" → 30g at 0.85. "season generously" → 2g salt at 0.65. the system is honest about uncertainty.
this eliminated 94% of extraction failures in testing.
where it stands
iterating in public. system is partially built. the challenges are defining the research agenda.
what's working
- graph schema: 12 node types, 23 relationship types
- USDA integration: 400k+ foods across five data types
- ingredient parsing: 84% accuracy on common ingredients
- proof-of-concept semantic search for recipe similarity
what's hard
1. measurement ambiguity — 84% accuracy hits a wall. the remaining 16% requires context. "add garlic to taste" — is that 1 clove or 5? depends on recipe type, cuisine, other ingredients. needs a probabilistic inference layer on top of the graph.
2. cooking method coverage — published research covers roasting, boiling, steaming for chicken, beef, fish. but sous vide? pressure cooking? air frying? fermentation? the long tail lacks systematic data. collecting research. gaps remain.
3. ontology — the hardest part isn't tech. it's deciding how to categorize food. is tofu a protein or vegetable? depends on dietary framework. vegan athlete = primary protein. bodybuilder = incomplete protein with low bioavailability. the graph needs to represent ambiguity, not force premature resolution.
4. multi-language — english only right now. german, spanish, italian have rich culinary traditions. "quark" in german isn't "quark" in english—it's fresh cheese with no direct equivalent. not translation problems. cultural mapping problems.
5. latency — target is under 3 seconds for recipe analysis. current bottleneck: LLM extraction (1.5-2s). investigating model distillation, speculative decoding, cached embeddings for common ingredients.
the numbers
current accuracy: 84% field-level on core fields (ingredient, quantity, unit).
error distribution:
- ambiguous brand references: 34%
- regional terminology: 28%
- novel/fusion ingredients: 22%
- unclear prep states: 16%
target: 95%+. the gap requires better context models and more training data.
why this matters
i built the first version for myself. wanted to track bottura's tortellini without an hour of math. wanted to optimize longevity without giving up food i actually enjoy.
but somewhere in the process, i realized: this isn't a personal tool. it's infrastructure.
the knowledge exists:
- USDA: 400k+ foods, peer-reviewed, constantly updated
- branded databases: 350k+ commercial products
- academic research: cooking method effects on bioavailability
it's just completely disconnected from where people actually find recipes. youtube. instagram. tiktok. blogs. cookbooks.
meal metrics ai is the missing layer.
not another calorie-counting app. those exist. they mostly fail because manual entry sucks.
this is the intelligence layer connecting unstructured recipe content to structured nutritional science.
an API that takes a bottura video as input and returns scientifically-grounded nutritional data as output.
that changes everything.
imagine:
- recipe apps that auto-calculate macros for any recipe you save
- meal planning that optimizes for protein while respecting what you actually want to eat
- fitness apps with real food intake, not rough estimates
- health platforms connecting diet to biomarkers with precision
this is what becomes possible when the knowledge graph exists.
limitations
technical
1. accuracy ceiling — 84% means 16% of estimates contain errors. for casual tracking, probably fine. for medical nutrition therapy—diabetes, renal diets, PKU—potentially dangerous. the system must communicate uncertainty. never present estimates as clinical-grade.
2. language scope — english only. german, spanish, italian need culturally-specific training. "quark" in german ≠ "quark" in english. not translation. cultural mapping.
3. cooking method gaps — published research covers the big methods for common proteins. the long tail—sous vide, pressure cooking, air frying, fermentation—lacks systematic data.
4. individual variation — population averages. real absorption varies by gut microbiome, genetics, medications, concurrent food intake. someone with celiac absorbs the same meal differently.
5. no clinical validation — not validated in medical settings. don't use for medical decisions without professional oversight.
ethical
eating disorder risk — research found 73% of MyFitnessPal users with eating disorders said the app contributed to their condition. calorie tracking correlates with eating concern and dietary restraint.
the tension: precise tracking enables informed decisions for many. but can trigger disordered patterns in vulnerable populations.
the system must:
- no gamification (streaks, achievements, red/green judgments)
- never frame eating as "good" or "bad" based on numbers
- clear off-ramps for concerning patterns
- opt-in to hide specific metrics
data privacy — recipe history reveals sensitive info. dietary restrictions → health conditions. ingredient patterns → religious practices. meal timing → work schedules. this data needs health-record-level protection.
creator attribution — extracting structured data from recipes raises IP questions. facts (ingredients, quantities) aren't copyrightable. creative expression may be. the system extracts factual info without reproducing creative content. boundary deserves attention.
algorithmic guidance — any system influencing what people eat carries responsibility. we explicitly avoid:
- medical or disease-related claims
- restrictive diet recommendations without professional context
- optimizing single metrics at expense of balance
- creating dependency on tracking
what's next
now
- multi-language — german and spanish first. culturally-native annotation teams.
- cooking method coverage — partnering with food science researchers. sous vide, air frying, pressure cooking, fermentation.
- active learning — auto-identify low-confidence extractions for human review. continuously improve edge cases.
- confidence calibration — make sure stated confidence matches actual accuracy.
later
- public API — tiered pricing for recipe apps, meal planners, fitness platforms.
- platform integrations — youtube, instagram, tiktok for seamless recipe import.
- mobile SDK — on-device extraction for privacy-sensitive use cases.
- barcode scanning — extend beyond recipes to packaged food.
vision
- real-time video analysis — extract ingredients as cooking videos play. don't wait for completion.
- biomarker correlation — connect meals to CGMs, sleep trackers, wearables. build personalized nutrition response models.
- recipe modification — "swap heavy cream for greek yogurt to hit your protein target while reducing saturated fat." actionable suggestions that respect the dish.
- proactive insights — pattern recognition across meal history. "your energy dips when lunch is low in protein." "you consistently underestimate pasta portions."
the point
nutritional data fragmentation isn't a technical limitation. it's an architecture failure.
we have:
- comprehensive databases (USDA)
- robust bioavailability research
- billions of recipe videos
they just don't talk to each other.
meal metrics ai builds intelligence that adapts to content as it exists. not waiting for the world to standardize. a graph RAG system that understands food through relationships—ingredient → preparation → cooking method → final nutritional outcome.
current state: 84% accuracy. clear path to 95%+.
research ahead: measurement ambiguity. cooking method coverage. ontological flexibility. multi-language support.
the impact goes beyond personal convenience. a universal layer connecting recipe content to nutritional science enables:
- AI nutrition coaching that works with what you want to eat
- meal planning respecting culinary tradition while optimizing health
- diet tracking that captures reality, not rough estimates
for anyone who's ever tried to track a recipe they actually enjoyed cooking—this should have existed ten years ago.
i'm building it now.
references
- USDA Agricultural Research Service. (2024). FoodData Central. U.S. Department of Agriculture. https://fdc.nal.usda.gov/
- Oz, F., Aksu, M.I., & Turan, M. (2017). A Comparison of the Essential Amino Acid Content and the Retention Rate by Chicken Part according to Different Cooking Methods. Korean Journal for Food Science of Animal Resources, 37(5), 739-749.
- Deng, Y. et al. (2022). Applications of knowledge graphs for food science and industry. Patterns, 3(5), 100484.
- Haussmann, S. et al. (2019). FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation. International Semantic Web Conference (ISWC).
- FoodOn Consortium. (2024). FoodOn: A farm to fork ontology. https://foodon.org/
- Anthropic. (2025). Structured Outputs Documentation. https://docs.anthropic.com/en/docs/build-with-claude/structured-outputs
- Anthropic. (2025). Claude Sonnet 4.5 Model Card. San Francisco: Anthropic.
- Radford, A. et al. (2023). Robust Speech Recognition via Large-Scale Weak Supervision. Proceedings of the 40th International Conference on Machine Learning (ICML).
- Popovski, G. et al. (2024). Deep Learning Based Named Entity Recognition Models for Recipes. Proceedings of the 2024 Joint International Conference on Computational Linguistics (LREC-COLING).
- Chia, Y.K. et al. (2022). Enhancing Food Ingredient Named-Entity Recognition with Recurrent Network-Based Ensemble (RNE) Model. Applied Sciences, 12(20), 10310.
- Palermo, M. et al. (2014). A review of the impact of preparation and cooking on the nutritional quality of vegetables and legumes. International Journal of Gastronomy and Food Science, 3, 2-11.
- Rinaldi, M. et al. (2022). Cooking at home to retain nutritional quality and minimise nutrient losses. Trends in Food Science & Technology, 126, 227-241.
- Neo4j, Inc. (2024). Neo4j Graph Database Documentation. https://neo4j.com/docs/
- Pinecone Systems, Inc. (2024). Pinecone Vector Database Documentation. https://docs.pinecone.io/
- Levinson, C.A. et al. (2017). My Fitness Pal Calorie Tracker Usage in the Eating Disorders. Eating Behaviors, 27, 14-16.
- Simpson, C.C. & Mazzeo, S.E. (2017). Calorie counting and fitness tracking technology: Associations with eating disorder symptomatology. Eating Behaviors, 26, 89-92.
- Linardon, J. & Messer, M. (2019). My fitness pal usage in men: Associations with eating disorder symptoms and psychosocial impairment. Eating Behaviors, 33, 13-17.
- Murphy, E.W., Criner, P.E., & Gray, B.C. (1975). Comparisons of methods for calculating retentions of nutrients in cooked foods. Journal of Agricultural and Food Chemistry, 23(6), 1153-1157.
- Tian, J. et al. (2024). Core reference ontology for individualized exercise prescription (EXMO). Scientific Data, 11, 1319.
- Transparency-One. (2024). Tracing the World's Food Supply from Farm to Fork with Neo4j. Neo4j Case Study.
- OpenAI. (2024). Whisper Large v3 Model Card. Hugging Face.
- Gupta, S. et al. (2024). Building FKG.in: A Knowledge Graph for Indian Food. Formal Ontology in Information Systems (FOIS 2024). arXiv:2409.00830.
- Bai, Y. et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073. Anthropic.
- Wang, W. et al. (2024). Nutrition-Related Knowledge Graph Neural Network for Food Recommendation. Foods, 13(13), 2111.
last updated: Dec 2025