| There are still many opportunities for niche specialized websites, for vertical markets. With huge public information available today online, from horizontal markets, it makes sense to aggregate collected data and expose it in different ways, to accomplish this goal. This article briefly presents data collection and aggregation concepts, then focus moves to data exposure, determined by user or business needs. The final chapters present some aggregation-based websites from own portfolio, with the ideas that led to business decisions. | ||
Overview
Aggregators commonly refer to websites or software programs that collect a specific type of data or information from multiple online or offline sources, and expose it to users in new ways. We have a typical two-steps process of data extraction and transformation. Best known data aggregators collect and expose news, polls, movie or product reviews, video films.
Many aggregators are also filtering data for a specific vertical or niche market. With so much content on the Internet already created and exposed by big portals, there are still plenty of opportunities to detect and implement use cases for specialized markets. Moving from horizontal to vertical niche markets, through data collection and aggregation, must solve specific user or business needs. Very often, you get an idea for a new small or medium website by simply identifying your own needs, from the real life. Most own websites that I built as data aggregators for niche markets the last couple of years came from one simple personal initial need I detected, which was not covered by existing websites, or was not covered properly, for maximum search efficiency and usability. I'll later illustrate the process of online aggregator website design through personal examples from my portfolio.
Data Collection
Aggregators make use of the huge amount of data publicly available on the web. First step for your aggregator is an initial big import of data collected from different sources. This is usually done with custom crawlers or bots, automated software programs that go on websites to extract rows of data organized in tables or lists, and store them in a local database. The process may be hard to implement, because data is never exposed through HTML in a uniform and consistent structure. Each set of pages has its own layout. The process is easier if you can import static page content, with no JavaScript or AJAX and no POST.
But first of all there is a huge legal or moral issue involved, which should be solved first. You have to make sure there are no copyright issues and you are allowed - by looking at the Terms of Service and robots.txt - to parse that public data. Even in this case, try to allow one second between each page access, to not cause trouble or DoS (Denial of Service) to the web server. Never import more that you need for your own website. And if you plan to expose yourself portions of data imported from someone else's website, consider giving credit by mentioning the source and eventually providing a hyperlink to it. Everybody is reluctant to people "stealing" data from them, but if they acknowledge your new website provides new and different services and they actually benefit from you sending them visitors, they will tolerate the crawling process, the way they enjoy seeing GoogleBot visiting their pages.

Data collection for aggregators makes sense when data is imported from multiple sources.
Some sources may already have data exposed for the same vertical market, but usually you identify
websites from the horizontal market. For instance, if you build a store for cellphones only,
you may import some initial data from large websites for all sorts of electronics.
It doesn't make sense and it is at least unethical to import data from a future competitor,
from a website with cellphones only.
When you create a niche website, it also makes sense to enrich collected main data with some original content, and to import additional types of data, such as public videos captured on cellphones or user reviews.
Data Exposure
Think about new ways to expose data to your users, others than through views similar to existing vendors. For instance, you may provide a geographic search for a product, where only those stores close from user's location are returned, eventually located on a Google Map.
Many ideas for a vertical niche market usually come from the user need of a specific view, or way to expose and handle the information, which cannot be found on other websites. Or simply because there are way too many websites with similar products users would have to browse, and you can offer a central point for the search, sending later the users on the websites you indexed the information from.
Your new website should also take advantage of the Web 2.0 themes, concepts, gadgets, web design, mobile, social media plugins. Older successful website frequently have an old design, are slow and difficult to navigate. Whatever you want to do, always enhance, do not copy.