We are pleased to introduce StarfishQL to the Rust community today. StarfishQL is a graph database and query engine to enable graph analysis and visualization on the web. It is an experimental project, with its primary purpose to explore the dependency network of Rust crates published on crates.io.
StarfishQL is a framework for providing a graph database and a graph query engine that interacts with it.
A concrete example (Freeport) involving the graph of crate dependency on crates.io is used for illustration. With this example, you can see StarfishQL in action.
At the end of the day, we're interested in performing graph analysis, that is to extract meaningful information out of plain graph data. To achieve that, we believe that visualization is a crucial aid.
StarfishQL's query engine is designed to be able to incorporate different forms of visualization by using a flexible query language. However, the development of the project has been centred around the following, as showcased in our demo apps.
In general, a query engine takes input queries written in a specific query language (e.g. SQL statements), performs the necessary operations in the database, and then outputs the data of interest to the user application. You may also view a query engine as an abstraction layer such that the user can design queries simply in the supported query language and let the query engine do the rest.
In the case of a graph query engine, the output data is a graph (wiki).
In the case of StarfishQL, the query language is a custom language we defined in the JSON format, which enables the engine to be highly accessible and portable.
In the example of Freeport, StarfishQL consists of the following three components.
Graph Query Engine
The engine listens at the following endpoints for the corresponding operation:
You could also invoke the endpoints above programmatically.
Graph data are stored in a relational database:
- Metadata - Definition of each entity and relation, e.g. attributes of crates and dependency
- Node Data - An instance of an entity, e.g. crate name and version number
- Edge Data - An instance of a relation, e.g. one crate depends on another
To obtain the crate data to insert into the database, we used a fast, non-disruptive crawler on a local clone of the public index repo of crates.io.
Here are some interesting findings we made during the process.
As of March 30, 2022
StarfishQL allows flexible and portable definition, manipulation, retrieval, and visualization of graph data.
The graph query engine built in Rust provides a nice interface for any web applications to access data in the relational graph database with stable performance and memory safety.
Admittedly, StarfishQL is still in its infancy, so every detail in the design and implementation is subject to change. Fortunately, the good thing about this is, like all other open-source projects developed by brilliant Rust developers, you can contribute to it if you also find the concept interesting. With its addition to the SeaQL ecosystem, together we are one step closer to the vision of Rust for data engineering.
StarfishQL is created by the following SeaQL team members:
We are super excited to be selected as a Google Summer of Code 2022 mentor organization!