论文部分内容阅读
In recent years, image recognition has become important in computer vision and image processing.Additionally, it is used in many fields such as driverless vehicles, healthcare, face recognition, search engines, etc.With its increase usage in mobile camera, many applications use image recognition algorithms such as navigations, dietary assessment, etc.
Food image classification and recognition are a branch field in computer vision.This field of dimension has attracted more attention because of its critical role-play in helping humans keep track of their nutrition, which in turn have significant impacts on human health.
Food image recognition based on a convolutional neural network has enjoyed a good number of applications with a reported high accuracy.This is true because convolutional neural networks have the capability to extract features directly from images thereby branding convolutional neural network as an efficient tool for image recognition.
Therefore, in its strives to contribute to this field of knowledge, this thesis sits on the foundation of the DenseFood model based on densely connected convolutional network architecture, that consists of initial layers, dense block layers, transition layers, and fully connected layers.Our model not very depth, but we increase the width to improve the performance.We use a convolution layer, as the initial layer to extract information as much as from food images before feeding into dense blocks, as well as use dense connectivity to extract new features.Also, we use max pooling to down-sample features and extract main features and food structures form images.Furthermore, we use ELU activation function to tackle the vanishing gradient problem, and to speed up the training process.
Additionally, the combination of Softmax loss and center loss was employed during the training process to minimize the variance in the same category at the same time maximizing the variance in different categories.The DenseFood, DenseNet121 and ResNet50 models were trained from scratch using the VIREO-172 dataset.In addition, we fine-tuned DenseNetl21 and ResNet50 pre-trained models, which trained on the ImageNet dataset to extract features from images of our dataset.
Experimental results showed that the DenseFood model has achieved accuracy better than other models that train from scratch.DenseFood accuracy is very close to pre-trained models and has achieved 81.68% for top-1 accuracy, whereas DenseNet and ResNet achieved 83.92%, 82.49% respectively for top-1 accuracy.Furthermore, the use of the densely connected convolutional neural network has achieved higher accuracy better than the ResNet model.
Food image classification and recognition are a branch field in computer vision.This field of dimension has attracted more attention because of its critical role-play in helping humans keep track of their nutrition, which in turn have significant impacts on human health.
Food image recognition based on a convolutional neural network has enjoyed a good number of applications with a reported high accuracy.This is true because convolutional neural networks have the capability to extract features directly from images thereby branding convolutional neural network as an efficient tool for image recognition.
Therefore, in its strives to contribute to this field of knowledge, this thesis sits on the foundation of the DenseFood model based on densely connected convolutional network architecture, that consists of initial layers, dense block layers, transition layers, and fully connected layers.Our model not very depth, but we increase the width to improve the performance.We use a convolution layer, as the initial layer to extract information as much as from food images before feeding into dense blocks, as well as use dense connectivity to extract new features.Also, we use max pooling to down-sample features and extract main features and food structures form images.Furthermore, we use ELU activation function to tackle the vanishing gradient problem, and to speed up the training process.
Additionally, the combination of Softmax loss and center loss was employed during the training process to minimize the variance in the same category at the same time maximizing the variance in different categories.The DenseFood, DenseNet121 and ResNet50 models were trained from scratch using the VIREO-172 dataset.In addition, we fine-tuned DenseNetl21 and ResNet50 pre-trained models, which trained on the ImageNet dataset to extract features from images of our dataset.
Experimental results showed that the DenseFood model has achieved accuracy better than other models that train from scratch.DenseFood accuracy is very close to pre-trained models and has achieved 81.68% for top-1 accuracy, whereas DenseNet and ResNet achieved 83.92%, 82.49% respectively for top-1 accuracy.Furthermore, the use of the densely connected convolutional neural network has achieved higher accuracy better than the ResNet model.