Apr 22
Tutorial on the multiple ways to send HTTP web requests in .NET and process returned web results and result streams. Demos with HttpWebRequest/Response classes, WebClient, ASP.NET Request/Response objects, and sending explicit HTTP requests from the socket level. Demistifying HTTP request and response headers.
Continue reading »
Continue reading »
Apr 21
Extract data from compiled CHM Help Files and, more generic, from any IStorage-based compound document. The file itself is seen as a component, because data extraction is actually performed through standard COM interfaces, and not through typical I/O operations on file streams. As consequence, it is not necessary at all to know the raw storage file format of these CFBF compound documents.
Continue reading »
Continue reading »
Apr 21
Dynamically extract and enhance data from the CNN Sports Illustrated web page with NHL's Standings. Very simple application usage of our generic web scrapper base class. This demo rather focuses on why web data extraction is sometimes necessary and how can extracted information be improved with some added value.
Continue reading »
Continue reading »
Apr 21
Parse PDF files and load their storage data structures into a custom object model. PDF files are ASCII text files with a standard open proprietary format, defined and controlled by Adobe. While the documentation for PDF file format is publicly available, there are not many available free open source tools to manage PDF documents. This is the first from a series of articles on PDF documents.
Continue reading »
Continue reading »
Apr 20
Implementation of a .NET registry browser that emulates behavior of Microsoft's RegEdit or RegEdt32.
Continue reading »
Continue reading »
Apr 19
Generic hierarchical Object Browser for both .NET types and their instances. This prototype may serve as a demo and base for other projects to come. It also provides a basis for both data and metadata representation. Unlike other common static metadata browsers, which show types, members and relationships between types, this Instance Browser will also dynamically auto-expand its nodes, at run-time, into child objects.
Continue reading »
Continue reading »
Apr 18
Automate extraction of genuine, system-related information, from any file, including its size on disk and actual content size. Open the File Properties dialog, with Win32 shell function calls. This is the first from a series of articles on all kinds of file information.
Continue reading »
Continue reading »
Apr 18
Simple C# .NET base class for web scrappers based on sequential text parsing, for static HTML pages, which will be used in many future projects and articles from this magazine. We cover the basic navigation functionality, sequential static content parsing and data extraction. We add performance counters and tracing facilities. First demo, for online real-time scraping of some Yahoo!Movies tables.
Continue reading »
Continue reading »
Apr 17
Some common cases where data extraction, especially through reverse engineering techniques, may be illegal. Are you allowed to extract and use any kind of data from software you own? Most common legal issues for web data extraction.
Continue reading »
Continue reading »
Apr 16
Extraction of icon images from a .ICO group of icons file. This is the first from a series of articles using the same FileIcon base class for icon extraction, from different files, containers and repositories.
Continue reading »
Continue reading »