Technical SEO

“It has been said, by engineers themselves, that given enough money, they can accomplish virtually anything: send men to the moon, dig a tunnel under the English Channel. There’s no reason they couldn’t likewise devise ways to protect infrastructure from the worst hurricanes, earthquakes and other calamities, natural and manmade.”

Henry Petroski


All SEO is Technical

Technical SEO is the art of engineering the foundation of your website(s) to create optimal search engine, and often, customer experiences. If you want to boast SEO metrics, focus on content and link building. If you want to boast your KPIs, focus on technical SEO. For the SEO team, it’s a constant battle of best practices, Google’s Webmaster guidelines, legacy code, faulty QA processes, constant changes in search engine algorithm and user behavior, IT project prioritization, translating technical concepts into simple business terms for executive buy-in, custom tools development to manage SEO processes and page life cycles, and last but most important, empowering engineers.

The questions below are in random order, intended to cover a ginormous range of topics that an enterprise, e-commerce SEO team should address. Topics include: web server logs, faceted navigation, sitemaps, custom tools, QA processes, platforms, code, semantic markup, entity search, internationalization, mobile technology, scalable content, automation, ROI estimation, and more.


“Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with no structural integrity, but just done by brute force.”

Alan Kay


Questions Worth Asking

Development & Quality Assurance

What type of Software Development Life Cycle (SDLC) does the company, specifically the IT organization use: agile or waterfall?

How do you win resources (developers and development hours) for SEO projects in the battle for prioritization against all other digital marketing, e-commerce and misc business teams?

Which developers in the IT, aka IS, department primarily works on SEO-related enhancements?

Who in QA, quality assurance, is responsible for SEO regression testing?

If SEO wishes to get their own Developer and QA resources, how should you make a case to your executives?

If SEO is bypassing typical project cycles via dedicated developers and QA resources to accelerate production, what processes should you establish so that you’re not creating conflict with the main project cycles?

How should the SEO team write and design project requirements? Epics and User stories?

How should SEO be constantly informed of code, database, or server-level changes made to the website that may directly or indirectly impact SEO processes or performance?

How should someone from the SEO team test front-end, or back-end changes during deployments? Which staging, development, or testing environments should SEO have access to for testing?

How does your company’s IT organization define and perform user-acceptance testing, quality assurance testing, regression analysis?

How should you teach SEO best practices (general rule of thumbs) within the IT organization so that the development team, the QA team, the infrastructure team, the database team, and/or the networking/security teams know how to proceed on their own, or to at least contact the SEO team for guidance?

How should the SEO team and IT split up testing (QA, UAT, etc.) responsibilities and build standard processes before, during or after code deployments?

During most deployments, what mistakes do you notice get made the most? What good habits surface?

A robust SEO foundation takes years to build, and can be destroyed in a single deployment without the right protection and precaution, aka Quality Assurance. What parts of QA should be done manually, and what parts should be automated via scripts?


A successful partnership between IT and SEO requires a balance of trust, autonomy and clear communication. Therefore, SEO walks a fine line between autonomy and micro-management: give too much instruction, you might get exactly what you need to satisfy search engines, but you’re stepping on toes by telling professionals how to do their job;  give too loose instruction, you might not get a surefire Google-friendly enhancement, but you’re building trust with your IT partners.

The Inner Workings of IT (Information Systems) Organization

How should you work and communicate so that you find the balance between building a good partnership and producing effective enhancements?

What servers, platforms, systems, codebase(s), languages, databases are they using? Which SEO elements touch each one of these?

 

What platforms, servers, database systems, code base, and files power your entire e-commerce website?

What are XML Sitemaps, and who generates, updates and tests them on a regular basis?

What rules and guidelines are in place to include only canonical and active category, product, static, and content URLs in Sitemaps?

How should XML Sitemaps be organized so that you don’t have a random, unordered list of 50K URLs per Sitemap?

How should you address image, video and other multimedia assets?

Besides XML Sitemaps, what other pages or files should you build to help search engines crawl your most important links? Eg., HTML sitemaps, RSS feed, atom feed, etc.

What about your robots.txt file? What directives should you use to help the search engine crawlers most important to your business navigate your site?

What crawlers, bots and scraper user-agents should you disallow from crawling your site?

The Website

What information architecture does your website use to link customers and bots from the home page down to the deepest level (e.g., product pages, blog posts, etc.)?

How many unique page templates – dynamic, static and hybrid – make up the entire website?

Tip: It’s common to have one or more page templates for category, product, facet, static and blog content URLs. Focusing your work at the template level, as opposed to page level or site level, leads to better results at scale.

Which CMS, files, and databases generate the content (header, body, footer) on each page template? Who has control and access to these?

How do meta tags populate by default logic (standards) on all your page templates? Think Yoast SEO’s title tag and meta description template variables.

What is the process for updating these standards (think Yoast SEO templates) to generate titles, meta descriptions, canonical URLs, robots tags for all of your different page templates?


“Java is to JavaScript what Car is to Carpet.”

Chris Heilmann


What are static URLs and dynamic URLs? What are absolute URLs and relative URLs?

On your home page, what type of URLs exist, and how are they generated?

On your category page templates, what type of URLs exist, and how are they generated?

On your faceted navigation page templates, what type of URLs exist, and how are they generated?

What is faceted navigation, and how should you take advantage of it to optimize for short, mid, and long-tail keywords?

On your product page templates, what type of URLs exist, and how are they generated?

How are URL patterns defined for each page template? How does sorting, filtering and paginating impact URL parameters?

Maximizing search engine crawl budgets helps with limiting duplicate content indexation, preventing bot traps, and improving site-wide link distribution. How should you use rules (canonical URLs, meta tags, robots.txt, Ignore Parameters feature in Webmaster Tools) in order to help Google better navigate your site?

Who manages URL redirects and what, if any, process allows SEO to add redirects?

Who has access to web server logs? Generally it’s the infrastructure / networking / security team.

What are user-agent strings, and how can you use them for web logs analysis?

How should SEO use web logs to analyze search engine crawler activity, and identify areas of improvement?

What other benefits can SEO, and other marketing / e-commerce teams reap from analyzing web logs?

A big challenge with obtaining server logs are file-size. How should SEO make the case to invest in tools, or improve methods, in order to get the right amount of data?

What triggers soft and hard 404s and other 4xx, 5xx error responses on your site? Which error types are the most frequently occurring?

How many errors does Webmaster Tools normally report on your site(s)?

What patterns do you spot in the 1,000 error sample URLs?

If WMT reports a total of 100K errors, but gives you only the 1,000 samples, how can you identify the remainder of the list?

Which errors should you redirect one by one, and which URL fixes should you scale (possible via RegEx)?

What are Regular Expressions (RegEx), and why should you learn the basic and advanced use cases?

How can use of Regular Expressions / Wildcard patterns help you redirect hundreds, thousands of URL permutations using only single lines of code?

Overall, dynamically-generated URLs cause more duplicate content issues / 404 issues / crawl waste than static URLs on large e-commerce sites. How should you code links, and use canonical URLs, robots.txt, robots tags in ways that ensure search engines concentrate crawls, indexation and link distribution to the most search-worthy pages?

How do category IDs and product IDs (commonly set up in product catalog systems) affect URLs, titles, descriptions, and internal linking?

Who sets up, inactivates, merges, or deletes category and product IDs?

How do they form parent-child relationships between categories, sub-categories, facets, and products in the overall site structure?

Are they restricted to primary relationships, or can they also create secondary, tertiary relationships so that products can be found via multiple categories?

Primary path relationships are family trees, but with categories, facets, and products. It starts with the highest-level parent category, its sub-categories, the relevant facets (attributes), and ends with the product. These relationships are maintained in the back end, usually your product and inventory management system, but an easy way to spot them on the front end is by looking at the breadcrumb path on category and product pages.

An example from Zappos.com, where the shoes category page is the top-level parent, and its children are two attributes from 2 different facet groups, Sneakers & Athletic Shoes, and Nike: http://www.zappos.com/nike-air-max-infuriate-low

How should you use the primary-path relationships to improve site structure for both search engine crawlers and customers?

How can you use primary path relationships to create automatic 301 redirect logic for when products, facets and categories inactivate (out of stock, discontinuation, etc.)

How should you over-ride default meta tag logic to strategically update title tags, descriptions, canonical URLs, robots tags, and copy blocks on high-opportunity pages? What tools should you use, or custom build, to make this happen in standard SEO processes?

Who’s going to build and maintain these custom tools?

Who’s responsible for data audits and quality tests?

Who’s responsible for regression testing SEO functionality and data after a fresh code release?

Why are breadcrumbs vital for site structure and a strong technical SEO foundation?

Which type of mark-up should you apply (schema.org, microdata, etc.) to your breadcrumbs?

Big retail sites are increasingly injecting and displaying 3rd party content / applications on their site(s), primarily by using some combination of APIs, Javascript, AJAX, HTML, CSS, etc. How should you leverage the content (product reviews, customer comments, product up-sell/cross-sell widgets, marketing promotions, etc.) so that search engines can crawl, index, and cache all of this content on the page?

Even a small portion of code/content on a single page can influence your rankings greatly. On large sites, this can cause great revenue impact. How much should you trust Google’s claims of successfully crawling, indexing and crediting all non-HTML and CSS content to all of your most vital pages?

How does DOM, Document Object Model interface, work, and influence page loading on your site? By extension, how does DOM impact your search engine crawling, indexing, and caching of JavaScript, AJAX and other content? How does this compare to view-source?

What are the key similarities and differences between Inspect Element and View Source?

What is semantic markup, and how should you use it on your site to optimize meta data?

Keyword search is based on common nouns, Entity search is based on proper nouns. What is Knowledge Graph, and how should you use it on your site to optimize for entity search: real-world proper nouns such as people, places, and things?

How should you analyze your site for potential entities, and subsequently, how should you optimize so that you can yield additional traffic via rich snippets, featured snippets, Google News/Top Headlines, image search, etc.?

What are the benefits of snippets, and in what ways can you quantify revenue impact?

Which technical enhancements will improve performance of page templates w/ high growth potential?

How should you quantify results from the constant improvements to the site via technical SEO?

Tip: unlike most other marketing vehicles, SEO requires little to no marketing / advertising spend to pull in the same or more revenue year-after-year. In other words, if you invest $100K in technical SEO and it returns $500K in sales in 2017, it often returns the $500K in future years without incremental spend.

How should you perform technical audits of your site’s SEO foundation? When using tools such as Screaming Frog, OnPage.org, or Botify, what settings should you use to perform efficient site crawls? Talk to your server teams in IT: do they have any crawl speed limits during peak hours of business?

When, and how often, should you perform site audits: during strategic roadmap planning, pre and post web code releases, weekly/monthly?

How do various page templates (category vs product vs facet vs static pages vs blog content) account for non-brand SEO traffic and revenue? In other words, do category pages bring you more non-brand sales than product pages? Which has more growth potential?

Every large retail site has 3 types of URLs: the great, the good, and the cruft. The great and good URLs (usually the category, category+facet, static, product) range from 20%-40% of the total available URLs for Google to crawl, index and rank. The remaining 60-80% are junk URLs (sorting options, product results per page, category+multiple facets, on-site search result, etc.) with thin content, aka cruft, because they contain little to no organic search value.

How should you concentrate search engine activity on the top 20-40% pages, so that you can better rank for short, mid and long-tail terms that are important for customers to search for?

Your big brand retail site home page PageRank is like your city’s water plant, and all your category, facet, product and static pages are its residents. How should you pipe your hyperlinks, i.e., maximize distribution and minimize waste of the flowing PageRank, across thousands of pages on your site?

Another way to think about that is, how do you lower the number of links per page, on your category, facet and product pages?

How do you work with UX/IT to lower / increase the number of links on a page in a way that focuses on customer experience, not just search engines?

Tip: consider talking to the UX and IT teams about testing a possible content silo structure. When implemented effectively, reducing the number of links in your headers on silo pages may boost rankings (more link juice for links in the body) and conversion (fewer interruptions to adding-to-cart).

What is your index status in Webmaster Tools? How many estimated URLs should Google index from your site?

How should you use XML Sitemaps, product feeds, canonical URLs, robots tags, robots.txt, and/or the Ignore Parameters feature in Webmaster Tools to expand or collapse the number of pages indexed?

What is your mobile SEO strategy? Do you have separate mobile site, tablet site?

Does your mobile and/or tablet sites use proper canonical rules?

Does content and meta data flow down from desktop pages to the mobile/tablet pages?

If not mobile sites, does your company have adaptive or responsive designs?

What is mobile-first indexing? What implications does it have on your overall SEO strategy?

Is an AMP implementation right for your business?

Bonus Considerations

For even deeper critical thinking about SEO, check out Michael King’s post on the Technical SEO Renaissance.

  • What “changes in web technology are causing a technical renaissance?”
  • What’s up with Google’s Angular JS and Facebook’s React JavaScript MVW frameworks?
  • What is HTTP/2 and how could it impact security and speed on your site?
  • What’s missing in SEO tools and why are they not up to par with how search engines really work?

Takeaway

Technical SEO creates sustainable and profitable long-term growth. Earn dedicated resources for SEO projects now so that projects that should’ve been done years ago, don’t take years to complete. A strong technical foundation amplifies the value of your links and content, thereby increasing your rankings site-wide. It also creates competitive advantage for sites with decent domain authority to outperform sites with higher domain authority. Every big site is challenged with organizational bottlenecks in creating optimal search engine experiences, so if you can get over this hurdle, you can take more market share in SERPs.

Also remember, it’s not just your own projects that will impact SEO. Every customer-facing enhancement or new feature may impact SEO, so talk to your business partners and review the requirements, user stories before development. Do it right the first time.

Leave a Reply