Index Models at the Schema Level for Web Data Search
Keywords:
Graph Summarization; Schema-level Graph Indices; Data SearchAbstract
Finding and examining data sources is one of the many options provided by indexing the Web of
Data. Selecting an appropriate index model—that is, how to index and summarize data—is a key design choice
when indexing the Web of Data. Several attempts have been made to create particular index models for a
particular goal. It is still challenging to determine whether a method generalizes effectively to a different task,
collection of queries, or dataset because each index model is created, put into use, and assessed separately. Six
typical index models with distinct feature combinations are empirically evaluated in this paper. One of these is a
novel index model that combines owl:sameAs and inferencing over RDFS. For the first time, we combine all
index models into a single stream-based architecture. Using two sizable, real-world datasets, we assess versions
of the index models taking into account sub-graphs of size 0, 1, and 2 hops. We assess the indices' quality in
terms of the F1-score, which indicates the approximation quality of the stream-based index computation, the
compression ratio, and the summarization ratio. For various index architectures, queries, and datasets, the trials
show significant differences in the approximation quality, summarization ratio, and compression ratio.
Nonetheless, we find significant connections in the outcomes that aid in selecting the appropriate index model
for a certain task, query type, and dataset.