Extracting Data from NoSQL Databases - A Step towards Interactive Visual Analysis of NoSQL Data
Examensarbete för masterexamen
Businesses and organizations today generate increasing volumes of data. Being able to analyze and visualize this data to find trends that can be used as input when making business decisions is an important factor for competitive advantage. Spotfire is a software platform for doing this. Spotfire uses a tabular data model similar to the relational model used in relational database management systems (RDBMSs), which are commonly used by companies for storing data. Extraction and import of data from RDBMSs to Spotfire is generally a simple task. In recent years, because of changing application requirements, new types of databases under the general term NoSQL have become popular. NoSQL databases differ from RDBMSs mainly in that they use non-relational data models, lack explicit schemas and scale horizontally. Some of these features cause problems for applications like Spotfire when extracting and importing data. This thesis investigates how these problems can be solved, thus enabling support for NoSQL databases in Spotfire. The approach and conclusions are valid for any application that interacts with databases in a similar way as Spotfire. General solutions for supporting NoSQL databases are suggested. Also, two concrete tools for importing data from Cassandra and Neo4j that have been implemented in the Spotfire platform are described. The presented solutions comprise a data model mapping from the NoSQL system to Spotfire tables,sampling and possibly clustering for finding schemas, and an extraction mechanism tailored to the particular system's query interface. The suggested solutions are not claimed to be complete. Rather, the work in this thesis can serve as a starting point for more thorough investigations or as a basis for something that can be extended.
Information Technology , Informationsteknik