Authors: B. Nandini
The process of creating descriptions for the events depicted in an image is known as image captioning. Deep Learning Models can be used to accomplish this image captioning. It is an extremely difficult issue to automatically generate a caption or explanation for an image using any natural language sentence. It takes techniques from computer vision to comprehend the image's content and a language model from natural language processing to translate the comprehension of the image into words in the correct sequence. Deep learning and natural language processing have advanced to the point where creating captions for the provided photos is now simple. We use a Convolutional Neural Network (CNN) that has been trained beforehand to extract high-level features, such as objects, forms, and textures, from photos. A Long Short-Term Memory (LSTM) network, a kind of Recurrent Neural Network (RNN) that can handle sequential input like sentences, is then fed these features.
Comments: 8 Pages.
Download: PDF
[v1] 2024-07-13 20:32:56
Unique-IP document downloads: 234 times
Vixra.org is a pre-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.
Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.