- Case Studies
Text Mining Platform
Text Mining Platform
Text Mining Platform is a semi-automatic application that can search, process, and present industry related data to any consumer.
Project tech usage
The idea was to create a pipeline that could pull articles from the Internet and give away the extracted knowledge. This pipeline consists of steps, or components. The extracted knowledge is saved into a database and then arranged for any possible use. Each component consumes data items of a certain type and produces data items of a certain type.
Performs periodical and/or real-time retrieval of data from external data sources using available APIs as well as HTML content scraping techniques.
Analyses, categorises, and stores the retrieved data in a manner most efficient for presentation to the end users.
3. Presentation API
A set of RESTful services that allow the frontend to consume the available data on a per-user basis.
How we approached the project
After the facts and events are extracted, an event stream for each user is created and passed to his UI. Every component of the pipeline persists its results.
The pipeline components consume and produce the following types of data items:
The system consists of the following components:
Named entity recognizer;
Temporal expressions extractor;
Named entity disambiguation;
As the system uses linguistic libraries that can produce errors, their output should be monitored and corrected if necessary and, if possible, library internal algorithms should be corrected to avoid errors in the future;
Each component provides an estimate of confidence for each output data item. Based on this estimation, before proceeding to the next pipeline stage the data item can be passed to the curation UI.
There are several options for a curator:
If curator approves the data item, it is passed to the next component;
If curator corrects the data item, some adjustments are introduced to the processing component, and the data item is passed to the component again. After that curator is able to view the data items that were processed one more time and approve them again if the level of confidence is acceptable;
Curator is able to throw away the output data item if it is totally incorrect. The data item is not passed to the next component of the pipeline.
Our Clients Say
I had the pleasure of working with Sergey and Uladzimir back when Desk-Net was a PowerPoint deck. Guys helped me to come up with the prototype to validate my vision, and then took on implementing and launching the first version of our product. I’d recommend working with them to any startup out there – they really take care of technology and help stakeholders to keep focused on business.
Matthias KretschmerCEO & Founder at Desk-Net.com
I would definitely recommend working with Geomotiv’s team for data, platform, middle tier API, and back-end systems. Pros: diligence and persistence in achieving code quality standards set by our team as part of addressing our code review comments. There is also dedication in solving complex technical challenges.
David PetersonSVP Technology at Rubicon Project
I hired Geomotiv to save a project after our US developer did a terrible job. Sergey and his company helped turn the project around, not only doing what we asked but also making suggestions that made the product much stronger.
Lloyd MelnickCEO at FIVEONENINE GAMES