🍽️ Model Card for Food Vision Model

This model is an image classification model trained to identify different types of food from images. It was developed as part of a Food Vision project, utilizing transfer learning on a pre-trained convolutional neural network.

Model Details

Model Description

This model is a deep learning model for classifying food images into one of 101 categories from the Food101 dataset. It was trained using TensorFlow and employs a transfer learning approach, leveraging the features learned by a model pre-trained on a large dataset like ImageNet. The training process included the use of mixed precision for potentially faster training and reduced memory usage.

⚛️ HuggingFace Space for Food Vision Model

Use it here

Developed by: Recompense Me!
Model type: Image Classification (Transfer Learning with a CNN backbone)
Language(s) (NLP): N/A (Image Classification)
License: MIT
Finetuned from model: EfficienntNetB0

Uses

This model is intended for classifying images of food into 101 distinct categories. Potential use cases include:

Food recognition in mobile applications.
Organizing food images in databases.
Assisting in dietary tracking or recipe suggestions based on images.

Limitations

Dataset Bias: The model is trained on the Food101 dataset. Its performance may degrade on food images that are significantly different in style, presentation, or origin from those in the training data.
Image Quality: Performance can be affected by image quality, lighting conditions, occlusions, and variations in food presentation.
Specificity: While it classifies into 101 categories, it may not distinguish between very similar dishes or variations within a category.

Evaluation

The model's performance was evaluated using standard classification metrics on a validation set from the Food101 dataset.

Testing Data

The model was evaluated on the validation split of the Food101 dataset.

Food101 Dataset: A dataset of 101 food categories, with 101,000 images. 750 training images and 250 testing images per class.
Source: TensorFlow Datasets

Factors

Evaluation was performed on the overall validation dataset. Further analysis could involve disaggregating performance by individual food categories to identify classes where the model performs better or worse.

Metrics

The primary evaluation metric used is Accuracy. A confusion matrix was also generated to visualize per-class performance.

Accuracy: The proportion of correctly classified images out of the total number of images evaluated.

$\text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total number of predictions}}$

Confusion Matrix: A table that visualizes the performance of a classification model. Each row represents the instances in an actual class, while each column represents the instances in a predicted class.

Results

70-80% Fluctuating accuracy on validation data

Summary

Transfer learning helped the model achieve greater accuracy, though the model struggled with food closely related to each other indicating more data was needed. The Dataset used a lot but more data is still needed to differentiate between closely looking food.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: Tesla T4
Hours used: 1 hour estimate(max)
Cloud Provider: Google Cloud
Compute Region: us-central
Carbon Emitted: 80 grams of CO₂eq (estimated)

Technical Specifications

Model Architecture and Objective

The model is a fine-tuned convolutional neural network (CNN) classifier.Mixed precision training was used for faster training, a modern CNN architecture compatible with float16 data types. The objective is to minimize the classification loss (e.g., categorical cross-entropy) to accurately predict the food category given an image.

Compute Infrastructure

The model was trained using a Tesla T4 GPU on Google Cloud in the us-central region. The estimated carbon emissions for 1 hour of training time on this setup are 80 grams of CO2eq. The environment was intended to support mixed precision training.

Software

TensorFlow
TensorFlow Datasets
NumPy
Matplotlib
Scikit-learn
Helper functions from helper_functions.py (for plotting, data handling)

Usage

Here's an example of how to use the model for inference on a new image.

First, make sure you have TensorFlow installed:

pip install tensorflow

Then, you can load the model and make a prediction:

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import os
import keras

# Available backend options are: "jax", "torch", "tensorflow".

os.environ["KERAS_BACKEND"] = "jax"
    
loaded_model = keras.saving.load_model("hf://Recompense/FoodVision")

# Define the class names (replace with the actual class names from your training)
# to test the model intitially you can use these class names and upload an image based on any class you choose
class_names = ['apple_pie', 'baby_back_ribs', 'baklava', 'beef_carpaccio', 'beef_tartare', 'beet_salad', 'beignets', 'bibimbap', 'bread_pudding', 'breakfast_burrito', 'bruschetta', 'buffalo_wings', 'caesar_salad', 'cannoli', 'caprese_salad', 'carrot_cake', 'cheesecake', 'cheese_plate', 'chicken_curry', 'chicken_quesadilla', 'chicken_wings', 'chocolate_cake', 'chocolate_mousse', 'churros', 'clam_chowder', 'club_sandwich', 'crab_cakes', 'creme_brulee', 'croque_madame', 'cup_cakes', 'deviled_eggs', 'donuts', 'dumplings', 'edamame', 'eggs_benedict', 'escargots', 'falafel', 'filet_mignon', 'fish_and_chips', 'foie_gras', 'french_fries', 'french_onion_soup', 'french_toast', 'fried_calamari', 'fried_chicken', 'frozen_yogurt', 'garlic_bread', 'gnocchi', 'greek_salad', 'grilled_cheese_sandwich', 'grilled_salmon', 'guacamole', 'gyros', 'hamburger', 'hot_dog', 'ice_cream', 'lasagna', 'lobster_bisque', 'lobster_roll_sandwich', 'macaroni_and_cheese', 'macarons', 'miso_soup', 'mussels', 'nachos', 'omelette', 'onion_rings', 'oysters', 'pad_thai', 'paella', 'pancakes', 'panna_cotta', 'peking_duck', 'pho', 'pizza', 'pork_chop', 'poutine', 'prime_rib', 'pulled_pork_sandwich', 'ramen', 'ravioli', 'red_velvet_cake', 'risotto', 'samosas', 'sashimi', 'scallops', 'shrimp_scampi', 'smores', 'spaghetti_bolognese', 'spaghetti_carbonara', 'spring_rolls', 'steak', 'strawberry_shortcake', 'sushi', 'tacos', 'takoyaki', 'tiramisu', 'tuna_tartare', 'waffles'] # Example class names

# Create a function to load and prepare images (from your notebook)
def load_prep_image(filepath, img_shape=224, scale=True):
    """
        Reads in an image and preprocesses it for model prediction

        Args:
            filepath (str): path to target image
            img_shape (int): shape to resize image to. Default = 224
            scale (bool): Condition to scale image. Default = True

        Returns:
            Image Tensor of shape (img_shape, img_shape, 3)
    """
    image = tf.io.read_file(filepath)
    image_tensor = tf.io.decode_image(image, channels=3)
    image_tensor = tf.image.resize(image_tensor, [img_shape, img_shape])
    if scale:
        # Scale image tensor to be between 0 and 1
        scaled_image_tensor = image_tensor / 255.
        return scaled_image_tensor
    else:
        return image_tensor

# Load and preprocess a sample image
# Replace 'path/to/your/image.jpg' with the actual path to your image
sample_image_path = 'path/to/your/image.jpg'
prepared_image = load_prep_image(sample_image_path)

# Add a batch dimension to the image
prepared_image = tf.expand_dims(prepared_image, axis=0)

# Make a prediction
predictions = loaded_model.predict(prepared_image)

# Get the predicted class index
predicted_class_index = np.argmax(predictions)

# Get the predicted class name
predicted_class_name = class_names[predicted_class_index]

# Print the prediction
print(f"The predicted food item is: {predicted_class_name}")

# Optional: Display the image
# img = plt.imread(sample_image_path)
# plt.imshow(img)
# plt.title(f"Prediction: {predicted_class_name}")
# plt.axis('off')
# plt.show()

Recompense
/

FoodVision