Allowed Reverse-Engineering Situations
So why is not data extraction and reverse-engineering methods banned as a whole? Why plenty of reverse-engineering methods and products are publicly available? Well, with all the legal aspects involved in the specific cases described before, there are plenty of other situations where reverse-engineering methods save the day.
Consider your database or text document is damaged, and your applications cannot use them anymore. No tools are available to recover what's left good in that file. If you know the format those files use, you can create your own tool and repair them, or at least recover parts of them. Yes, many companies use proprietary undocumented formats, and you're not allowed to "guess" them. But when you do it to simply recover your own data, nobody can blame you.
Another situation is when you move from one application to another similar product, but using different file formats. You simply want to transfer your data, but frequently product vendors do not make tools available to encourage the competition. Once again, we talk about getting access to a static file format - which could be proprietary. Do it for your own only, and nobody gets affected.
Plenty of software applications are freely offered and distributed today. There is no copyright on them, but it may happen the source code is not written in the programming language you know. For instance, the VB.NET source code of a .NET application is freely available, but you're a C# developer. What you can do is easily build an executable assembly from the VB.NET source code, then use ILDASM, Reflector or other disassembler, to revert it to equivalent C# code. This can also apply when no source code is available. Just use the executable assembly to reverse-engineer it to your preferred language.
I knew someone who had his database hosted remotely by an ASP (Application Service Provider). When time came to renounce to his services, no tool was available to collect his online data and save it locally, for later use with another provider or his own application. What he did was write a web parser - similar to the crawlers or spiders used by the search engines - and automatically collect the information from his thousands of web pages. This kind of automatic web data extraction is perfectly legal, when it's about data you own or when the provider has nothing against.
In fact, the mentioned search engines are perfect examples of other software applications that use intensively reverse-engineering techniques of web content. Not only most site owners have nothing against, but the large majority are happy when Google or Yahoo robots visit them. This is because their sites will get more exposure.
We already talked about intellectual property associated not just with software programs, but also with music or literature. On the web, the common term to refer to this kind of data is multimedia. There are many perfectly legal situation where you need to extract some picture frames from a video file, take screenshots of some e-book, extract a small audio stream from a song. Once again, this can be done if you know the file format or the object model of the components used to access and process that kind of data.
Data extraction is frequently followed by some data transformation. Sometimes the goal is simply to present data in a better suitable way for you. Programming language syntax highlighters help you better understand your code, because language keywords, class names and string values have different colors. You extract data, but further associate meaning to different terms, based on the place they take in the text flow.
Aggregators combine data extracted from different places, and possibly different kind of repositories, into better uniform views. Search engine directories are a perfect example of aggregators, RSS another. Aggregators are like index systems. They generally use someone else's data, but help people find it and eventually present it in a better way.