It is pretty much agreed that Google can and probably does read metadata embedded in photos, though whether that influences SEO in any way is still disputed. In fact, the conventional wisdom seems to be that search engines do not take into account photo-embedded text (assuming they can read it at all) and that the practice of embedding text in photos is generally a bad idea for a series of other non-SEO reasons (mostly having to do with accessibility of the information for the user). At the same time, the question if text embedded in photos “can’t be read by search engines” remains. And as Google is making increasingly significant efforts in the direction of image recognition technology, having recently acquired DeepMind, it’s hard to believe that photo-embedded text is not an area of interest. In this long and (we hope) interesting article we did some interesting experiments in order to understand how Google is approaching the image search matter and to see what the implications for the SEO and digital marketing field are.
Why Should I Care About Images & SEO?
What’s the case for photo-embedded text? There are several intuitive scenarios that come to mind, out of which the case of logos seems like the most obvious. Logos are basically text information, in a lot of cases, but in image form. Sure, it’s probably just another iteration of the brand name in many cases, but it’s a relevant reiteration of it. This is probably why there already is patented a technology that does exactly this . Other brand-related examples come to mind, mostly in the form of online image advertisements. Obviously, there’s some interest in this. But let’s go back to the original question: why should we care about text that is embedded in pictures (other than logos)? The best answer is probably that… that’s just how people talk over the Internet nowadays. Through pictures. And I don’t mean that in the classic “an image is worth a thousand words” metaphorical sense, but in that so much of the content and of the way content is structured has to do with the use of images as lines in an imaginary dialogue, with text embedded in those images.
It has become a common place to say that an image is worth a thousand words. It’s less common knowledge exactly how many words (and keywords) Google makes of an image. What we can say with certainty is that in a lot of cases and in most of our experiments, Google turned out to be pretty savvy in interpreting images. It would explain why it interpreted the picture of a rear view of a red sports car as “90’s cars” instead of a new Ferrari, which it is. Probably because in most places the similar images were hosted, the text surrounding the image talked about 90’s cars.
Admittedly, the design is fairly reminiscent of that era’s particular brand of ostentatious sport cars. The adjacent text theory also explains the other misinterpreted image, the one with the “Try it for free” text. It also supports the idea that although reading image-embedded text may be in the books for Google, it’s not yet at the level of mastery needed to be universally implemented. Repeating the experiment for our logo (which contains no embedded text), the search engine did a pretty good job figuring out that the logo is related mainly to “cognitive seo”, probably by drawing conclusions from the text in the vicinity of various image occurrences and compiling a best guess, based on the visually similar images contingent on the order of the pixels in the image.
The tumbler-wielding youth of today are pretty tired of having to look for hours and hours for that particular funny cat gif (yeah, you know the one, that one) that has something written all over it with none of the desired results. We probably wouldn’t care about that either, but they are not only the consumers of tomorrow, but already the consumers of today. According to the data at least, the 18-29 demographic is knocking at 90% Internet usage worldwide. If the even-younger are at all different, it’s probably upwards, not downwards. Ultimately, so much of Internet content is images, that you just can’t ignore it. Neither can search engines. People are drawn to images (when they’re relevant) more than they are to text . So being able to put text on images as opposed to below or near them might just be the next best thing. This is not just a trend and likely will not go away. Is anyone doing anything about it though?
Interesting Google SEO Experiments
with Images, Embedded Text, Exif Data and More
1. Yes! Google Can Read Embedded Text in Images
Yes, Google can read embedded text in images and it’s doing it very well. Let’s take for instance Google Keep, the note taking service from Google that takes the idea of “note taking” to another level. And this is because , you can have Google Keep transcribe the text for you instantly if your note consists of a picture, such as that of a book.
Besides, optic character recognition (OCR) technologies are already used on a large scale, mostly by Google itself for scanning books in the Google Books service. The main problem OCR developers have to deal with is the less-than-100%-accuracy issue, which is vital for making the process fully automatic. Perhaps this is something that might make Google deploy the technology regularly for searches, but not yet let it affect rankings.
Furthermore, Google makes a fine job in extracting text from Scanned PDFs as well. That’s right, scanned PDFs, where the text is not selectable. We took a part from the text and did a search query in Google. Guess what happened! The big G was able to digitize the content and returned the exact phrase that we were looking for, even though that text was actually in a scanned PDF, basically an image.
There are a slew of other image-recognition-related patents, mostly focusing on object recognition (image recognition search patent, image recognition methods patent , pixel hashing image recognition system , etc.) which cover everything from privacy and social networks to driverless cars. Identifying and using keywords extracted from images seems to be one of Google’s main concerns, judging by one of the big G’s patents.
2. Does Google Read Exif Data from Images?
It’s no secret, however, that Google does take into account other type of data. On the issue of EXIF data (metadata about the picture coming from the camera , information such as focal distance, ISO, lens type etc.)
Matt Cutts elusively admits that Google “reserve(s) the right to potentially use” the data for ranking purposes .
So, for instance, if you took your picture with a 50 mm primer lens of a certain brand and type, and this information gets recorded as EXIF data, it is possible that whenever someone looks for data about that particular brand and type of lens, they will also be directed to examples of shots taken with that camera, and in particular to your site. And while Cutts treats the matter in a relaxed, you-have-it-it’s-fine-you-don’t-it’s-fine manner, it’s quite clearly a situation where it’s “finer” if you have it. Which is probably why there’s a site that “EXIF-ies” your photos if they were slighted at “birth” and does it’s best to add EXIF metadata to them as if they were real, straight out of the camera’s mouth.
3. How Does Google Decide Which Image to Rank Higher (from the same site/URL)
Still, that’s not what really interests us. Is there something more beneath the surface, just like with the overused cliche of an iceberg? Or is there less than we’ve assumed to the subject, just like with the ever shrinking arctic ice cap? Speaking of which, how do we know how much ice there is in the polar caps at any given time? Unlike Google ranking algorithms, the answer to this question is pretty straightforward: scientists use what’s called the Pan-Arctic Ice Ocean Modeling and Assimilation System (in short: PIOMAS). Aside from the perk of being an interesting piece of trivia, this is also useful to our SEO-related queries. We tried to do an image search for “piomas arctic sea ice volume”. Of the images that the search returned, there was one in particular that had this exact phrase inside it, as picture-embedded text.
In the site it is found, there is no mention of the exact keyword match we looked for. According to the official story, this is easily explained by the fact that the site had the text “arctic sea ice volume” in the vicinity of the picture, and the picture itself had the title “PIOMAS Spiral”. Which is probably the case.
With one exception, all the other pictures on top of that search have “PIOMAS” somewhere in the name or Alt Text and “arctic sea ice volume” somewhere in a text in the vicinity of the picture. And that one exception has this the other way around. It’s all reasonable here folks, nothing to see, move along. But on your way home, take into consideration the fact that of the seven top pictures, five also happen to have 4 of the 5 search words as embedded text, and two pictures have 3 of the 5 search words as embedded text. Not enough to prove the case, but enough to support it in the light of the fact that there are quite a few search results, but those particular pictures came up first.
4. Does Embedded Text in Images affect Your SEO?
Unfortunately, the issue of whether or not Google reads text that is embedded in pictures is a much less cut-and-dry issue than that of the melting polar caps.
Since it’s easier to disprove than to prove something, we tried a different experiment this time, one that started with an image.
We did an image search after an image of the text “Google” (going a bit meta here), fully expecting Google to catch on. It didn’t, however. In fact, it went quite a long way around and for reasons that are not entirely clear associated the image with the keywords “eagle eye solutions”. At least the images it found as being most similar to our own had to do mostly with that. The image results are somewhat varied, but if we were to guess, we’d say they were most likely the result of a basic similarity algorithm done at a pixel and color level. All images use mostly if not all black text and are roughly the same width and height. Disappointingly, that’s pretty much it. Assuming Google didn’t do that just to mess with this article (it didn’t), this search was proof that the search engine does not, in fact, extract text from images to use it in its search queries. At least not as a general rule.
5. How to Best Optimize Your Images for SEO
So then, how do you use images to your advantage when it comes to SEO ? Google itself has a few nuggets of wisdom on this. The two prerequisites to have your images turn up in searches are:
- have content that is easy to crawl
- have your images in one of the supported formats (it can be any of the classic image formats, like BMP, GIF, JPEG, PNG, WebP or SVG).
So far so good, right? Remember, however, that these two conditions only make sure that your images are indexed at all. There are, of course, some other things you can do to try and maximize your chances of showing up at the top of the results page. Give the file a name that is directly related to the image content and a description that makes it easier to understand for your readers and place it on a page where it actually belongs and enhances the text (avoid keyword stuffing for the image ALT). Whether or not you use an Image Sitemap, how much metadata is attached to your image or whether or not you mark an image as adult-restricted will also influence how often and how highly ranked it will turn up on a given search. And remember: while you have nothing to lose from embedding some text in your pictures (and potentially a little to win), you should not embed important text that you don’t have written somewhere on the same page as well.
Best Google Images Use Cases for Internet Marketers
It’s not all about rankings though, it’s about the bigger picture: your brand. Some of the most important things related to the brand have to do with an entirely different issue: copyright. Here too, Google’s ability in handling images can be useful for a number of reasons.
1. How to Find the People Who Use Your Images
One of the easiest things to gain online is a quick rise to fame. Which is useful, especially if you’re a young artist, or new in the field of design. You draw an image, or take a picture of something that appeases (and pleases) the almighty Internet users. They then have a myriad ways of sharing that image (reblogging, reposting, retweeting etc.) to their own pages and websites. Voila: you’ve squeezed yourself some fresh SEO juice. Only sometimes web users give you credit and sometimes (a lot of times) they don’t. If you’re simply curious about how far your fame has spread, you can do an image search with the actual image you’re interested in. Google will turn up the various uses of the image across the web, including its presence on social sites. From there you can check out each search result to see exactly the context in which the image appears.
2. How to Find the People Who Mention Your Brand in Images
It may have reached your ears the fact that brand mentions are considered now the new links. To set things even more clear, Google has a patent where it is written in black and white that brand mentions or citations are “implied links”. Thereby, finding people who mention your brand in images might come really in hand on the background of these recent changes. As we can see in the screenshot below, the process of finding mentions of your brand can really make your communication manager’s day better. Not only did the search engine “guessed” what symbol I was looking for but it helped me to easily figure out the pages that include the image, new or old mentions of the brand.
3. How to Find People Who Are Stealing Your Pictures
This is also a useful process in trying to find out who republished your image without permission or without giving due credit. Considering how easy it is to reblog or repost an image without also copying the accompanying text, it makes sense that credit is easily discarded somewhere along the way in a lot of cases. Moreover, it is entirely possible that a republished image will rank higher than its original source sometimes. Since there isn’t any automation on this yet, your best chance is to keep a close eye on the free flow of information. Keep in mind that if this seems like too daunting a task for some images, you can combine image and text as search criteria for a more refined, targeted inquiry. Google usually adds the text by itself, but it’s useful to guide it certain times.
4. How to Find Agencies or Sites that Steal Your Logo Concept
Worse yet than someone stealing your drawings or pictures is someone stealing your logo concept. Which is why it’s useful that you can look for that too. We went and searched for a logo that yielded two different companies using the same design. The trouble with logos, of course, is that there is always a chance that the part of the logo we might be interested in is itself “embedded” in (or simply part of) bigger pictures, that include embedded text or additional graphics. If the big picture is indexed and crawled as such, there’s a much higher chance Google will not return it as a top result, or as a result at all. But insofar as most logo thieves steal because they’re lazy, the most common scenario is that logo designs are stolen as such and you can at the very least sift through the most blatant cases of theft.
Can Duplicate or Low Quality Images Attract Google Penalties?
It goes without saying that Google is trying to win a long term battle against those who seek to manipulate the search results for their own ends and provide users with the best results they can. Google keeps on changing or improving their algorithm in order to stimulate the webmasters to provide the best content they can for their users. So, if Google is focusing on boosting engaging content and penalizing the low quality one, wouldn’t make sense that it would apply the same rules for images and photos? Google is keeping the bar high when it comes to original content. However, Matt Cutts himself mentions that there is no impact on the organic web ranking if you use stock imagery versus original imagery. It’s a bit of an interesting contradiction here, the way I see it. Even more, isn’t it possible that Google’s algorithm uses alt tags, image captions or embedded text in images as ranking metrics? The same Matt Cutts tries to clarify the things in this area, answering shortly “no”. Still, the head of Google’s spam team takes in sight the possibility that Google will update its SEO algorithm to filter for original image content. Judging by the way Google keeps on sweeping algorithms, we can expect a public response in the near future form the big G representatives, informing us that images are taken into consideration when deciding the list of rankings.
Conclusion
The best way to predict the future is to invent it – Alan Kay
Even if you might have ignored the image search from Google, it might be high-time you used it. Not only can you clearly manage your brand activity but you can also understand the market’s tendency or your competitor’s strategy. As I was saying at the beginning of the article, Google can and probably does read metadata embedded in photos. Does it make use of it every time? Will it use it as a ranking factor in the near future? We can’t know for sure but we surely can stay ready for it. Putting this in other words, it’s better to be safe than sorry so the best time to take care of your images is now. In the near future, Google might flag poor or duplicate content in images or might revise the way they “read” and rank images. It’s better to prepare now for such a thing to happen, because even if nothing comes to pass, you will still have better content.
Thanks for a deep research!
There was no doubt that images are being “read” as long as we know that YouTube easily transcribes all YouTube videos, but only a deeper look could show some interesting insights!
Visual content isn’t less popular than text and it’s optimization must not be neglected!
exactly. think ahead. it will be used by Google as a ranking signal. so why not optimize before they do.
Great analysis!
It would appear that colour/shape is more influential than image text, but it could easily change. I just wrote a post about how Google picked up my optimised “SEO” image as “SS Division” (Nazi party)! The black background seemed to override the alt text. Image best practice should probably involve checking G for keyword images before trying to get creative.
probably it should be checked in google to see what they already think about it.
really great article…good read! I suspected as such. I own a fitness site and started ranking for keywords on webpages that were picture heavy or exclusively pictures.
One of the best cognitive posts Razvan, hats off! 🙂
🙂 tks Amod.
Probably one of the best posts I have read in recent times. So, Google can read images, but still has some limitations when it comes to ‘text’ in images. That’s interesting. Great job Razvan. 🙂
Great post Razvan. I have been using Google’s image search options to find who use my images or to trace who’ve stolen my pictures. That’s really interesting! But I didn’t hear that Google can capture the text from scanned PDF documents in search results -not able to be believed. Anyhow, Google always bringing new changes in their search algorithms and pushing the webmasters (those who really don’t follow Google’s guidelines) in to trouble 🙂 Thanks for sharing this post!
Wow. Great article. Thanks so much for this. Have just been having conversations about image alt tags and the naming of images. So now lets just assume that Google is ‘reading’ the image as well. Google is big brother really. Lets just assume they know everything! 🙂
Really great stuff, and a lot more deep than I first thought. I have a nutrition webshop myselfe, and we actually have great benefit of marking up product pictures for meta, and doing just a tiny little bit of SEO in them. For competitive products we are outranking most of our competitors, and for those looking into product pictures, we are actually boosting our sellings with as much as 2 % – which indeed is half a month of salary for one employee – just by adding meta info.
Second, I find it a bit scary, that google really are able to grab text direct from images. In Denmark this is allready to be used in a surveillance perspective. Later this year all danish police cars will have an OCR camera attached with a 360 degree angle, so it will automatically register all cars the policecar passes. This info can be used for a whole lot of purposes – also the scary ones to normal non criminal citizens.
Google already uses the OCR technology in the Mapping process. When they scan the roads they register all written info ( even the writing on walls :)) . So this is a thing that is already happening 🙂 … and this is only the beginning.
4 years ago, I taught my 7 year old to build his website. His homework was to write a page that got hime number 1 rankings. He chose the low competitive phrase “Future Leader Of The Planet”. I suppose he’s still number 1. The point is, you could pick a “no competition phrase” and optimize the image properties for that phrase. Then, without mentioning it in the post, upload the image. A couple weeks later, search the phrase and see if it shows in the image results. If it does, you will know that Google DOES read and use the image properties/details. You might even use a different phrase for title, subject, tags, and comments to see which if any or all are used.
As a side note, I suppose this page will soon be a runner up for my son’s homework assignment phrase! Isn’t SEO fun, lol.
did the test. as it is shown in the article above they have the technology but they do not use it large scale for the search engine yet. probably they will do it in the future.
I wonder whether this has any implications for website that use stock photography, could an image that appears on your site and thousands of others now be considered duplicate content?
This is a really well researched article, thanks heaps for posting.
in the future probably it will. but now the original image may not appear for the correct term and duplicate one could appear based on the text near the image or in the alt and image name etc.
But after the march 12 updated of Google ( known as Florida 2) all sites with copied images are penalised. My site is also a victim.
An interesting and useful blog post Razvan! Thanks for sharing this detailed blog about how Google considers usage of images.
Dear Razvan this is the best post I have read in understanding how Google understands the images we post. Thank you for taking the time to put all this together. TY
glad you like it and tks for the appreciation Camila.
Hi Razvan,
I want to thank having someone from time to time talking publicly about Google consuming metadata within JPG/PNG image files.
Your post is worth sitting confortably and reading carefully. About metadata, I have personally been betting for inserting and editing XMP and IPTC data into all “relevant” images and this has been going on for years now, I wonder when will Google make my sleeping work worth the effort and research…
Great in depth review. Thank you for taking the time to put so many different aspects down and ways to find stolen or re-used images. I know of someone where it cost them $40000 for using images they grabbed off the internet. Ouch.
Hi Razvan, thanks for the interesting read! How did you conduct your experiments? Did you completely separate the tested images from all context (URL etc…)?
This is absolutely amazing. The article precisely describes how Google brain’s seo tech work for the images. This also is a good reminder that images must be taken seriously.
Basically i came here for searching the reasons if an image could outrank your content. Some of my images show up in Google Web search like we see Youtube Videos ( not talking about Google Images) for some important keywords instead of the posts themselves. Do you have any suggestions for it? Thanks in advance.
Wow this is interesting I wonder how much further their algorithm will improve. Looks like I’ll have to add some images with text and do some split testing. Appreciate the article!
Side Note – this G SERP returned a pdf that was text as vector images, and Google scanned and read the vector based image and indexed it.
https://www.google.com/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&q=site%3Awww.labomed.com%2Fpdf%2FW-2100_p1_11-12-14.pdf%20high%20technology%20spectrophotometer%20for%20testing%20water%2C%20waste%20water%2C%20agricultural%2C%20food%2C%20industry%2C%20chemistry%2C%20environmental%2C%20and&oq=site%3Awww.labomed.com%2Fpdf%2FW-2100_p1_11-12-14.pdf%20high%20technology%20spectrophotometer%20for%20testing%20water%2C%20waste%20water%2C%20agricultural%2C%20food%2C%20industry%2C%20chemistry%2C%20environmental%2C%20and&aqs=chrome..69i57j69i58.6855j0j4
nice catch Matt. indeed Google has indexed text that was rendered as vectors 🙂
Hi Razvan,
Thanks for the in-depth information, however i have a query:
Will there be any negative effect on the rankings of particular page if the page has a copied content from any other third party website in image form like (infographics & banner).
Thanks
I think Google has the answer on this one. I will cite directly from the Google Search Quality Evaluator Guidelines from March 2017 :
The Lowest rating is appropriate if all or almost all of the main content on the page is copied with little or no time, effort, expertise, manual curation, or added value for users. Such pages should be rated Lowest, even if the page assigns credit for the content to another source.
Therefore, according to Google, copied content (regardless of its nature) can get you into trouble.
Hi, I’ve read your article and really got some useful information. Thanks a lot! Now I have a problem and want to ask you for an advice. I changed some of my products’ images so the links of images changed as well. Will this affect the SEO? (The alt of images are the same as before)
Hi Razvan
I agree with you, Google definitely reads text in images, I just checked a couple of mine where the text only appears on the image as part of graphic, and Google displays that image in search results for the exact match key word.
Great article mate. thanks for the research.
Razvan!
Appreciate you taking the time to put this content together. I was checking out other sources online to see if the text over laying images was able to be crawled as well. Definitely got a lot of valuable content. That pop-up opt-in on the site is pretty craft as well. Those eyes are something else! hahah Thanks again!
Can the XMP data of an MP4 be searched? If I put a transcript of the video in the XMP file, will it be able to search through the XMP data to locate keywords?
Really cool insight into image text & EXIF data, nicely put together. I’ve noticed more and more images using text in them than previous years. Looks like SEOs are specifically putting text in images. I’ll be running some tests myself in this area, thanks for the inspiration.
Definitely an Interesting article. I remember there was a time where Google couldn’t even read pdf documents. A lot of pdf docs are updated on High Authority domains, do you think it would be possible to rank one those pages ( we’ll add some content ) by taking advantages of these websites! Maybe in a niche which is not very competitive?
Let me know what you guys think.
Cheers