We source our data by visiting (crawling) a business' website, and use machine learning and algorithms to extract relevant data.
Let's take a look at the main data points and how we gather them:
Email addresses & phone numbers
We find them on the company's website. They're publicly available, making it easy to answer the question "how did you find my email", if a prospect asks you that.
We don't guess or try email patterns, which is the main reason why our users see lower bounce rates compared to other tools. However, it's always a good practice to use an email checking tool.
We have a machine learning (ML) model that extracts all the addresses found on a company's website.
There usually are more than one addresses on a company's website, so right after extracting them we assign a score to each, based on factors such as:
- how "complete" an address is -> e.g. an address made of city, state, street, number and zip will score higher than an address made of just city and state.
- what page the address is found on -> an address found on the "contact us" page will score higher than an address found on other pages
- Number of occurrences of an address on the website -> an address mentioned 5 times will score higher than an address found once
That's how we identify the "main location" of a company.
We make an analysis of the text we find on the website, which is then processed by a ML model which knows to identify over 500 business categories.
This has the advantage that we can always expand how granular our categorisation is. So if you need us to add another category, please let us know :)
The disadvantage is that it's not on accurate on non-English websites. We are working on that.
The audit is made using a tool called Lighthouse by Google
Technologies, social media, external links
In order to find technologies, we look at different URLs and cookies, and we have a database of technologies we can detect.
Social media and external links are just links we find on the company's website. In the future, we will add social media data to the platform, but right now it's limited to what we find on the website.