It is pretty much agreed that Google can and probably does read metadata embedded in photos, though whether that influences SEO in any way is still disputed. In fact, the conventional wisdom seems to be that search engines do not take into account photo-embedded text (assuming they can read it at all) and that the practice of embedding text in photos is generally a bad idea for a series of other non-SEO reasons (mostly having to do with accessibility of the information for the user). At the same time, the question if text embedded in photos “can’t be read by search engines” remains. And as Google is making increasingly significant efforts in the direction of image recognition technology, having recently acquired DeepMind, it’s hard to believe that photo-embedded text is not an area of interest. In this long and (we hope) interesting article we did some interesting experiments in order to understand how Google is approaching the image search matter and to see what the implications for the SEO and digital marketing field are.
Why Should I Care About Images & SEO?
What’s the case for photo-embedded text? There are several intuitive scenarios that come to mind, out of which the case of logos seems like the most obvious. Logos are basically text information, in a lot of cases, but in image form. Sure, it’s probably just another iteration of the brand name in many cases, but it’s a relevant reiteration of it. This is probably why there already is patented a technology that does exactly this . Other brand-related examples come to mind, mostly in the form of online image advertisements. Obviously, there’s some interest in this. But let’s go back to the original question: why should we care about text that is embedded in pictures (other than logos)? The best answer is probably that… that’s just how people talk over the Internet nowadays. Through pictures. And I don’t mean that in the classic “an image is worth a thousand words” metaphorical sense, but in that so much of the content and of the way content is structured has to do with the use of images as lines in an imaginary dialogue, with text embedded in those images.
It has become a common place to say that an image is worth a thousand words. It’s less common knowledge exactly how many words (and keywords) Google makes of an image. What we can say with certainty is that in a lot of cases and in most of our experiments, Google turned out to be pretty savvy in interpreting images. It would explain why it interpreted the picture of a rear view of a red sports car as “90’s cars” instead of a new Ferrari, which it is. Probably because in most places the similar images were hosted, the text surrounding the image talked about 90’s cars.
Admittedly, the design is fairly reminiscent of that era’s particular brand of ostentatious sport cars. The adjacent text theory also explains the other misinterpreted image, the one with the “Try it for free” text. It also supports the idea that although reading image-embedded text may be in the books for Google, it’s not yet at the level of mastery needed to be universally implemented. Repeating the experiment for our logo (which contains no embedded text), the search engine did a pretty good job figuring out that the logo is related mainly to “cognitive seo”, probably by drawing conclusions from the text in the vicinity of various image occurrences and compiling a best guess, based on the visually similar images contingent on the order of the pixels in the image.
The tumbler-wielding youth of today are pretty tired of having to look for hours and hours for that particular funny cat gif (yeah, you know the one, that one) that has something written all over it with none of the desired results. We probably wouldn’t care about that either, but they are not only the consumers of tomorrow, but already the consumers of today. According to the data at least, the 18-29 demographic is knocking at 90% Internet usage worldwide. If the even-younger are at all different, it’s probably upwards, not downwards. Ultimately, so much of Internet content is images, that you just can’t ignore it. Neither can search engines. People are drawn to images (when they’re relevant) more than they are to text . So being able to put text on images as opposed to below or near them might just be the next best thing. This is not just a trend and likely will not go away. Is anyone doing anything about it though?
Interesting Google SEO Experiments
with Images, Embedded Text, Exif Data and More
1. Yes! Google Can Read Embedded Text in Images
Yes, Google can read embedded text in images and it’s doing it very well. Let’s take for instance Google Keep, the note taking service from Google that takes the idea of “note taking” to another level. And this is because , you can have Google Keep transcribe the text for you instantly if your note consists of a picture, such as that of a book.
Besides, optic character recognition (OCR) technologies are already used on a large scale, mostly by Google itself for scanning books in the Google Books service. The main problem OCR developers have to deal with is the less-than-100%-accuracy issue, which is vital for making the process fully automatic. Perhaps this is something that might make Google deploy the technology regularly for searches, but not yet let it affect rankings.
Furthermore, Google makes a fine job in extracting text from Scanned PDFs as well. That’s right, scanned PDFs, where the text is not selectable. We took a part from the text and did a search query in Google. Guess what happened! The big G was able to digitize the content and returned the exact phrase that we were looking for, even though that text was actually in a scanned PDF, basically an image.
There are a slew of other image-recognition-related patents, mostly focusing on object recognition (image recognition search patent, image recognition methods patent , pixel hashing image recognition system , etc.) which cover everything from privacy and social networks to driverless cars. Identifying and using keywords extracted from images seems to be one of Google’s main concerns, judging by one of the big G’s patents.
2. Does Google Read Exif Data from Images?
It’s no secret, however, that Google does take into account other type of data. On the issue of EXIF data (metadata about the picture coming from the camera , information such as focal distance, ISO, lens type etc.)
Matt Cutts elusively admits that Google “reserve(s) the right to potentially use” the data for ranking purposes .
So, for instance, if you took your picture with a 50 mm primer lens of a certain brand and type, and this information gets recorded as EXIF data, it is possible that whenever someone looks for data about that particular brand and type of lens, they will also be directed to examples of shots taken with that camera, and in particular to your site. And while Cutts treats the matter in a relaxed, you-have-it-it’s-fine-you-don’t-it’s-fine manner, it’s quite clearly a situation where it’s “finer” if you have it. Which is probably why there’s a site that “EXIF-ies” your photos if they were slighted at “birth” and does it’s best to add EXIF metadata to them as if they were real, straight out of the camera’s mouth.
3. How Does Google Decide Which Image to Rank Higher (from the same site/URL)
Still, that’s not what really interests us. Is there something more beneath the surface, just like with the overused cliche of an iceberg? Or is there less than we’ve assumed to the subject, just like with the ever shrinking arctic ice cap? Speaking of which, how do we know how much ice there is in the polar caps at any given time? Unlike Google ranking algorithms, the answer to this question is pretty straightforward: scientists use what’s called the Pan-Arctic Ice Ocean Modeling and Assimilation System (in short: PIOMAS). Aside from the perk of being an interesting piece of trivia, this is also useful to our SEO-related queries. We tried to do an image search for “piomas arctic sea ice volume”. Of the images that the search returned, there was one in particular that had this exact phrase inside it, as picture-embedded text.
In the site it is found, there is no mention of the exact keyword match we looked for. According to the official story, this is easily explained by the fact that the site had the text “arctic sea ice volume” in the vicinity of the picture, and the picture itself had the title “PIOMAS Spiral”. Which is probably the case.
With one exception, all the other pictures on top of that search have “PIOMAS” somewhere in the name or Alt Text and “arctic sea ice volume” somewhere in a text in the vicinity of the picture. And that one exception has this the other way around. It’s all reasonable here folks, nothing to see, move along. But on your way home, take into consideration the fact that of the seven top pictures, five also happen to have 4 of the 5 search words as embedded text, and two pictures have 3 of the 5 search words as embedded text. Not enough to prove the case, but enough to support it in the light of the fact that there are quite a few search results, but those particular pictures came up first.
4. Does Embedded Text in Images affect Your SEO?
Unfortunately, the issue of whether or not Google reads text that is embedded in pictures is a much less cut-and-dry issue than that of the melting polar caps.
Since it’s easier to disprove than to prove something, we tried a different experiment this time, one that started with an image.
We did an image search after an image of the text “Google” (going a bit meta here), fully expecting Google to catch on. It didn’t, however. In fact, it went quite a long way around and for reasons that are not entirely clear associated the image with the keywords “eagle eye solutions”. At least the images it found as being most similar to our own had to do mostly with that. The image results are somewhat varied, but if we were to guess, we’d say they were most likely the result of a basic similarity algorithm done at a pixel and color level. All images use mostly if not all black text and are roughly the same width and height. Disappointingly, that’s pretty much it. Assuming Google didn’t do that just to mess with this article (it didn’t), this search was proof that the search engine does not, in fact, extract text from images to use it in its search queries. At least not as a general rule.
5. How to Best Optimize Your Images for SEO
So then, how do you use images to your advantage when it comes to SEO ? Google itself has a few nuggets of wisdom on this. The two prerequisites to have your images turn up in searches are:
- have content that is easy to crawl
- have your images in one of the supported formats (it can be any of the classic image formats, like BMP, GIF, JPEG, PNG, WebP or SVG).
So far so good, right? Remember, however, that these two conditions only make sure that your images are indexed at all. There are, of course, some other things you can do to try and maximize your chances of showing up at the top of the results page. Give the file a name that is directly related to the image content and a description that makes it easier to understand for your readers and place it on a page where it actually belongs and enhances the text (avoid keyword stuffing for the image ALT). Whether or not you use an Image Sitemap, how much metadata is attached to your image or whether or not you mark an image as adult-restricted will also influence how often and how highly ranked it will turn up on a given search. And remember: while you have nothing to lose from embedding some text in your pictures (and potentially a little to win), you should not embed important text that you don’t have written somewhere on the same page as well.
Best Google Images Use Cases for Internet Marketers
It’s not all about rankings though, it’s about the bigger picture: your brand. Some of the most important things related to the brand have to do with an entirely different issue: copyright. Here too, Google’s ability in handling images can be useful for a number of reasons.
1. How to Find the People Who Use Your Images
One of the easiest things to gain online is a quick rise to fame. Which is useful, especially if you’re a young artist, or new in the field of design. You draw an image, or take a picture of something that appeases (and pleases) the almighty Internet users. They then have a myriad ways of sharing that image (reblogging, reposting, retweeting etc.) to their own pages and websites. Voila: you’ve squeezed yourself some fresh SEO juice. Only sometimes web users give you credit and sometimes (a lot of times) they don’t. If you’re simply curious about how far your fame has spread, you can do an image search with the actual image you’re interested in. Google will turn up the various uses of the image across the web, including its presence on social sites. From there you can check out each search result to see exactly the context in which the image appears.
2. How to Find the People Who Mention Your Brand in Images
It may have reached your ears the fact that brand mentions are considered now the new links. To set things even more clear, Google has a patent where it is written in black and white that brand mentions or citations are “implied links”. Thereby, finding people who mention your brand in images might come really in hand on the background of these recent changes. As we can see in the screenshot below, the process of finding mentions of your brand can really make your communication manager’s day better. Not only did the search engine “guessed” what symbol I was looking for but it helped me to easily figure out the pages that include the image, new or old mentions of the brand.
3. How to Find People Who Are Stealing Your Pictures
This is also a useful process in trying to find out who republished your image without permission or without giving due credit. Considering how easy it is to reblog or repost an image without also copying the accompanying text, it makes sense that credit is easily discarded somewhere along the way in a lot of cases. Moreover, it is entirely possible that a republished image will rank higher than its original source sometimes. Since there isn’t any automation on this yet, your best chance is to keep a close eye on the free flow of information. Keep in mind that if this seems like too daunting a task for some images, you can combine image and text as search criteria for a more refined, targeted inquiry. Google usually adds the text by itself, but it’s useful to guide it certain times.
4. How to Find Agencies or Sites that Steal Your Logo Concept
Worse yet than someone stealing your drawings or pictures is someone stealing your logo concept. Which is why it’s useful that you can look for that too. We went and searched for a logo that yielded two different companies using the same design. The trouble with logos, of course, is that there is always a chance that the part of the logo we might be interested in is itself “embedded” in (or simply part of) bigger pictures, that include embedded text or additional graphics. If the big picture is indexed and crawled as such, there’s a much higher chance Google will not return it as a top result, or as a result at all. But insofar as most logo thieves steal because they’re lazy, the most common scenario is that logo designs are stolen as such and you can at the very least sift through the most blatant cases of theft.
Can Duplicate or Low Quality Images Attract Google Penalties?
It goes without saying that Google is trying to win a long term battle against those who seek to manipulate the search results for their own ends and provide users with the best results they can. Google keeps on changing or improving their algorithm in order to stimulate the webmasters to provide the best content they can for their users. So, if Google is focusing on boosting engaging content and penalizing the low quality one, wouldn’t make sense that it would apply the same rules for images and photos? Google is keeping the bar high when it comes to original content. However, Matt Cutts himself mentions that there is no impact on the organic web ranking if you use stock imagery versus original imagery. It’s a bit of an interesting contradiction here, the way I see it. Even more, isn’t it possible that Google’s algorithm uses alt tags, image captions or embedded text in images as ranking metrics? The same Matt Cutts tries to clarify the things in this area, answering shortly “no”. Still, the head of Google’s spam team takes in sight the possibility that Google will update its SEO algorithm to filter for original image content. Judging by the way Google keeps on sweeping algorithms, we can expect a public response in the near future form the big G representatives, informing us that images are taken into consideration when deciding the list of rankings.
The best way to predict the future is to invent it – Alan Kay
Even if you might have ignored the image search from Google, it might be high-time you used it. Not only can you clearly manage your brand activity but you can also understand the market’s tendency or your competitor’s strategy. As I was saying at the beginning of the article, Google can and probably does read metadata embedded in photos. Does it make use of it every time? Will it use it as a ranking factor in the near future? We can’t know for sure but we surely can stay ready for it. Putting this in other words, it’s better to be safe than sorry so the best time to take care of your images is now. In the near future, Google might flag poor or duplicate content in images or might revise the way they “read” and rank images. It’s better to prepare now for such a thing to happen, because even if nothing comes to pass, you will still have better content.