Determining query equivalence between newly registered query and already executing query (permalink)
The corresponding challenge is #106, which contributes to scenario #16. This challenge is an extension of challenge #84.
Problem
The Solid Stream Aggregator is a service that aggregates data streams from Solid pods and stores the aggregation in another pod. This aggregation uses the LDES in LDP specification. The user can register queries into the aggregator and generate a continuous materialized view over the streams stored in the pods. Moreover, multiple users can register the same query to the aggregator. On registering the same query, the aggregator currently instantiates a new process to compute and develop a continuous view. However, such an approach is resource-consuming and hinders the scalability of the aggregator. Therefore, there is a need for the aggregator to determine if a newly registered query is similar to an already executing query, and if so, not execute the query again but reuse the results of the already executing query. The Query Registry component of the aggregator keeps a record of the queries that are being executed currently and can be employed to identify query equivalence between a newly registered query and an already executing one.
Approved solution
We developed a library to determine the equivalence between two RSP-QL queries. RSP-QL is an extension of SPARQL to support continuous querying of data streams. These queries can work with timestamp-based RDF Stream Processing (RSP) engines. We developed this demo to show how the library works.
We made the following important technological decisions and assumptions:
- Since we are only interested in demonstrating the functionality of the Query Registry of the Solid Stream Aggregator, our demo doesn't include the actual aggregation of the different data streams. In case you wish to run the aggregator with pods and are interested in the aggregation results, you can check out this demo.
User flow
Actors/actresses
- User of the demo
Preconditions
- The user has Node.js installed.
Steps
- Clone the repository via
shell
git clone https://github.com/argahsuknesib/query-equivalence-demo.git
- Navigate to the folder
query-equivalence-demovia
shell
cd query-equivalence-demo
- Install the dependencies via
shell
npm i
- Start the Community Solid Server instance that hosts data used by the queries via
shell
npm run start-solid-server
- Start the aggregator in a separate terminal via
shell
npm run start demo
- Register the first query
sparql
PREFIX saref: <https://saref.etsi.org/core/>
PREFIX dahccsensors: <https://dahcc.idlab.ugent.be/Homelab/SensorsAndActuators/>
PREFIX : <https://rsp.js/>
REGISTER RStream <output> AS
SELECT (AVG(?o) AS ?averageHR1)
FROM NAMED WINDOW :w1 ON STREAM <http://localhost:3000/dataset_participant1/data/> [RANGE 10 STEP 2]
WHERE{
WINDOW :w1 { ?s saref:hasValue ?o .
?s saref:relatesToProperty dahccsensors:wearable.bvp .}
}
with the aggregator in a new terminal via
shell
npm run query-one
The aggregator outputs the message
The query you have registered is not already executing. 7. Register the second query
sparql
PREFIX saref: <https://saref.etsi.org/core/>
PREFIX dahccsensors: <https://dahcc.idlab.ugent.be/Homelab/SensorsAndActuators/>
PREFIX : <https://rsp.js/>
REGISTER RStream <output> AS
SELECT (AVG(?timestamp) AS ?averageTimestamp)
FROM NAMED WINDOW :w1 ON STREAM <http://localhost:3000/dataset_participant1/data/> [RANGE 10 STEP 2]
WHERE{
WINDOW :w1 { ?s saref:hasTimestamp ?timestamp .}
}
with the aggregator via
shell
npm run query-two
The aggregator outputs the same message
The query you have registered is not already executing.
Both queries are on the same data stream source, but the basic graph patterns are not isomorphic. Therefore, the aggregator registers both queries. 8. Register the third query
sparql
PREFIX saref: <https://saref.etsi.org/core/>
PREFIX dahccsensors: <https://dahcc.idlab.ugent.be/Homelab/SensorsAndActuators/>
PREFIX : <https://rsp.js/>
REGISTER RStream <output> AS
SELECT (AVG(?o) AS ?averageHR1)
FROM NAMED WINDOW :w1 ON STREAM <http://localhost:3000/dataset_participant1/data/> [RANGE 10 STEP 2]
WHERE{
WINDOW :w1 { ?s saref:hasValue ?o .
?s saref:relatesToProperty dahccsensors:wearable.heartRate .}
}
with the aggregator via
shell
npm run query-three
The aggregator outputs the message
The query you have registered is not already executing. 9. Register the fourth query
sparql
PREFIX saref: <https://saref.etsi.org/core/>
PREFIX dahccsensors: <https://dahcc.idlab.ugent.be/Homelab/SensorsAndActuators/>
PREFIX : <https://rsp.js/>
REGISTER RStream <output> AS
SELECT (AVG(?o) AS ?averageHR1)
FROM NAMED WINDOW :w1 ON STREAM <http://localhost:3000/dataset_participant1/data/> [RANGE 10 STEP 2]
WHERE{
WINDOW :w1 {
?subject saref:relatesToProperty dahccsensors:wearable.heartRate .
?subject saref:hasValue ?object .
}
}
with the aggregator via
shell
npm run query-four
The aggregator outputs the message
The query you have registered is already executing.
These queries are on the same data stream source and the basic graph patterns are isomorphic. Therefore, the aggregator will execute only one of the queries. 10. Register the fifth query
```sparql
PREFIX saref: <https://saref.etsi.org/core/>
PREFIX dahccsensors: <https://dahcc.idlab.ugent.be/Homelab/SensorsAndActuators/>
PREFIX : <https://rsp.js/>
REGISTER RStream <output> AS
SELECT (AVG(?object) AS ?averageHR1)
FROM NAMED WINDOW :w1 ON STREAM <http://localhost:3000/dataset_participant1/data/> [RANGE 10 STEP 2]
WHERE{
WINDOW :w1 {
?subject saref:relatesToProperty dahccsensors:wearable.Accelerometer .
?subject saref:hasValue ?object .
}
}
```
with the aggregator via
```shell
npm run query-five
```
The aggregator outputs the message
> The query you have registered is not already executing.
- Register the sixth query
sparql
PREFIX saref: <https://saref.etsi.org/core/>
PREFIX dahccsensors: <https://dahcc.idlab.ugent.be/Homelab/SensorsAndActuators/>
PREFIX : <https://rsp.js/>
REGISTER RStream <output> AS
SELECT (AVG(?object) AS ?averageHR1)
FROM NAMED WINDOW :w1 ON STREAM <http://localhost:3000/dataset_participant2/data/> [RANGE 10 STEP 2]
WHERE{
WINDOW :w1 {
?subject saref:relatesToProperty dahccsensors:wearable.Accelerometer .
?subject saref:hasValue ?object .
}
}
with the aggregator via
```shell npm run query-six
```
The aggregator outputs the message
The query you have registered is not already executing.
For these two queries, the basic graph patterns are isomorphic, but the data stream sources are different. In this case, the aggregator will execute both queries.
Postconditions
None.
Follow-up actions
None.
Future work
- Support query containment as well as sharing of intermediate RDF result sharing between data streams.
Lessons learned about developer experience
None.
Contributors
- Challenge: Kushagra Singh Bisen
- Solution: Kushagra Singh Bisen
- Report: Pieter Heyvaert