Oh, Snap! Scientists Are Turning People's Food Photos Into Recipes

NPR | By Laurel Dalrymple

Published August 2, 2017 at 12:13 PM EDT

You already know what all of your friends are eating, so you might as well know how to make it, too.

When someone posts a photo of food on social media, do you get cranky? Is it because you just don't care what other people are eating? Or is it because they're enjoying an herb-and-garlic crusted halibut at a seaside restaurant while you sit at your computer with a slice of two-day-old pizza?

Maybe you'd like to have what they're having, but don't know how to make it. If only there were a way to get their recipe without commenting on the photo.

Researchers at the Massachusetts Institute of Technology's (CSAIL) would like that for you, too. That's why they're creating an artificial neural network — a computer system modeled after the human brain — to examine those photos and break them down into recipes.

The growth of the Internet has supported the ability to collect and publish several large-scale datasets, allowing for great advances in the field of artificial intelligence (AI), says Javier Marin, a postdoctoral research associate at CSAIL and co-author of a paper published this July at the Conference on Computer Vision and Pattern Recognition in Honolulu.

"However, when it comes to food, there was not any large-scale dataset available in the research community until now," Marin says. "There was a clear need to better understand people's eating habits and dietary preferences."

To do this, researchers have been feeding the computer pairs of photos and their corresponding recipes — about 800,000 of them. The AI network, called Recipe 1M, chews on all of that for a while, learning patterns and connections between the ingredients in the recipes and the photos of food.

"What we've developed is a novel machine learning model that powers an app. The that you see is just a pretty interface to that model," says Nicholas Hynes, an MIT graduate student at CSAIL who also co-authored the paper.

You, too, can try out this interface, called . To use it, just upload your food photo. The computer will analyze it and retrieve a recipe from a collection of test recipes that best matches your image.

It usually works pretty well, although it can miss an ingredient or two sometimes. Take for example, this video, in which the MIT team uploads a photo of sugar cookies.

"The app took the image, figured out what was in it and how it was prepared, and gave us the recipe that it thinks was most likely to have produced the image," says Hynes.

Pic2Recipe did correctly identify eight out of the 11 ingredients. And it did accurately find a recipe for sugar cookies. Alas, it missed the icing.

But the program doesn't need to visually recognize every ingredient in the photo to find an accurate recipe.

"Just like a human, it can infer the presence of invisible, homogenized or obscured ingredients using context. For instance, if I see a green colored soup, it probably contains peas — and most definitely salt!" says Hynes. "When the model finds the best match, it's really taking a holistic view of the entire image or the entire recipe. That's part of why the model is interesting: It learns a lot about recipes in a very unstructured way."

But as with every new technology, there are some kinks to work out.

The current model sometimes has trouble making fine distinctions between similar recipes, Hynes says. "For instance, it may detect a ham sandwich as pastrami or not recognize that brioche contains milk and egg. We're still actively improving the vision portion of the model."

Another issue, Hynes says, is that the current model has no explicit knowledge of basic concepts like flavor and texture. "Without this, it might replace one ingredient with another because they're used in similar contexts, but, doing so would significantly alter this dish," Hynes says. "For example, there are two very similar Korean fermented ingredients called gochujang and doenjang, but the former is spicy and sweet while the latter is savory and salty."

There are other refinements to be made, such as how to recognize an ingredient as diced, chopped or sliced. Or how to tell the difference between different types of mushrooms or tomatoes.

And when a reporter at The Verge tried the demo, photos of ramen and potato chips turned up no matches. How could the program miss such basics?

"This is simply explained by not having recipes for those foods in the dataset," Hynes says. "For things like ramen and potato chips, people generally don't post recipes for things that come out of a bag."

In the future, the MIT researchers want to do more than just let you have what they're having. They are seeking insight into health and eating habits.

"Determining the ingredients — and therefore how healthy they are — of images posted in a specific region, we could see how health habits change through time," says Marin.

Hynes would like to take the technology a step farther, and is working on a way to automatically link from an image or ingredient list to nutrition information.

"Using it to improve peoples' health is definitely big; when I go to community/potluck dinners, it always astonishes me how people don't pay attention to preparation and how it relates to plausible serving sizes," he says.

Hynes also can see how aspiring cooks might appreciate a system that takes a restaurant item and tells them how to make it. "Even everyday people with dietary restrictions — gluten free, vegan, sparse pantry — would appreciate a tool that could minimally modify a complicated dish like Beef Wellington so that it fits the constraints."

And why stop there? These are MIT scientists, after all, collaborating with researchers from the and the in Spain.

"In the far future, one might envision a robo-chef that fully understands food and does the cooking for you!" Hynes says.