Data Cleansing

Data has been cleaned up with automated scripts. Please be noted that computer is not human. It can only tidy up data with pre-defined logic. Thus, it is not surprise to see entries with insufficient address, such as missing shop number or street number. Also, some format of data is not unified, such as phone number. For example, instead of country standard  (area code) xxxx xxxx, some would be in one string adhered together. You may also expect duplicated entries even after automated cleansing. Here under please see why


Duplicate removal has been done to all data we supply. The logic we remove duplicates is to compare email address + address + phone + business name.

However, this will not 100% eliminate duplicates for the following reason, especially for dataset acquired from multiple sources:

  1.  for data comes from different sources, the way businesses info was stored might be slightly different which computer cannot identify. For example,  Shop 1, 93 Somewhere Hwy, and 1/93 Somewhere Hwy are different address from computer perspective. Even coming from the same source, it is possible to have duplicate due to businesses trying to maximizing exposure by creating multiple inserts with different business name format, eg ABC company Limited and ABC Co Ltd.
  2. the reason for not using only one field to remove duplicate records is realistic. If a business has multiple outlets or franchised business, they may have same email, same URL, even same phone number, with different address. If we use less fields combined for unique entry, some outlets might not be included.
  3. To ensure 100% unique, it must be done by human or a complicated comparison logic for computer to determine similarity like human. This is doable, just matter of cost. You may expect to pay 5 to 20 times more for what you can get now. It is very labor intensive tasks.


It is not surprise to have inaccurate data on the list. They could be closed down businesses which no one will actively update the records of this kind. Sometimes, a closed down business could take 5 or 10 years before records were removed.  Thus, there could be identical address + identical phone with different business name, especially in restaurant business when new owner taking over, just changed the business name without changing phone number. However, according to our experience, these kind of duplicates are very minimal, but possible.

Present Format

Field data format may not be unified as result of various data supply source. The most significant field is phone number. Again, cleansing can be done to any degree depending on cost you want to pay.

We are able to supply data in any degree of cleansing. Data in this store is priced based on the cleanliness and quantity of the files. They are supplied AS THEY ARE. We do not make explicit or imply guarantee against Uniqueness, Accuracy and Present Format.  Further to that, we occasionally offer discount as extra value for you to get huge sales leads for minimal cost.


Custom Data Request

Shall you require particular data, by industry, by country, etc. and could not find in our store, please contact us via our contact form and let us know what you want to get, quantity, and how clean you want data to be. We can supply raw, generally cleaned, very clean, extremely clean, and the top level – unique + uniformed format.

* Please note that we DO NOT collect personal data of any kind. We only collect business data.



When can I get the data after purchased?

You can download straightaway upon payment is successful. Download link can be found in your email or via My Account section (registered user). Please note that the download link will expire in 90 days after purchase. You can download 3 times during this period of time. Please retain a backup copy in your computer.


Any guarantee with purchase?

Sorry. All data sold as they are. As mentioned, they are generally cleaned and ready to use. You can start your sales and marketing activities straight away. A simple usable percentage guide could be useful for you to consider:

multiple sourced data

Raw Data (approx 50%), General cleaned (60-80%), Very Clean (70-90%), Extremely Clean (85-95%), Top Level (100%).

single sourced data

Raw Data (approx 75%), General cleaned (85%+), Very Clean (90%+), Extremely Clean (95%+), Top Level (100%).

Usable percentage is based on duplicate entries, completion of data (partial records missing one or two fields, eg email, or full address those you want to use), closed down business.


This figures are from experience. They are not a fixed figures because it really depends on luck.  Business in some industries are tend to use tidy data, eg professional business services, etc. most of these businesses have proper website and email address. Businesses in hospitality, construction, etc. will get much less proper information, especially website URL and email.

Apart from industry, country is also another factor effecting data quality. Businesses in developed countries tend to have more comprehensive business information that developing countries.

Therefore we cannot make any guarantee unless custom order.


Can I get data with coordinates (Latitude, Longitude)?

Yes. We can do that with customized scripts and via third party paid GeoLocation API to standardize address, city, state, region, post code, country all in one go, as long as you are preparing for multiple times of General level cost, plus cost for API usage.


Can I get data translated?

Yes. We can do that with customized scripts and via third party paid Translation API to get particular content field translated, as long as you are preparing for multiple times of General level cost, plus cost for API usage.  But we do not suggest you to do that. There are many ways to achieve real-time translate. However, if you want to achieve language SEO purpose, you may do so for SEO purpose.


What else can you do?

Well, anything relating to digital/online marketing, such as programmatic advertisement, social media marketing, SEO, adWords management, photography, video production, advertisement production, web design, graphic design, application development, etc. Digital Marketing is really our core business indeed.