Desktop Search Hackfest/Design discussion

Points:

95% use cases should be sufficient: find example: (1)usually you remember where a file is located, that's 95% of use cases however if you don't, you use recursive search tool and it's extremely useful despite being very slow (2)if you replace slow recursive tool with a fast indexer-based search, you get the pain threshold lower and capture much more use cases than the slow tool, effectively making "remember where a file is located" much less than 95% use cases.

There's a feedback loop: the better is implementation of particular functionality the more it is used, which in turn increases use case % of things previously considered a corner case to the point of it becoming important.

Query language limits ontology, this causes ontology split between nepomuk and xesam: Often you can model undelying data using either a more natural and flexible approach, which uses advanced QL or a restricted approach which can work fine with basic QL

However users of simple QL don't get any benefits from the flexible representation, while losing the ability to efficiently query it even in cases covered by the simple model. This causes them to push "simplification" of the ontology thus causing ontology split


consequences of the split. pros and cons of xesam as nepomuk base vs xesam<->nepomuk adapter (1) Split in base ontologies causes split in user/extension ontologies, thus causing the split in the user base as well (2) Ontology development tends to diverge. Details lose coherence, which coupled with stability requirements means more compatibility headaches (3) compatibility-by-wrapper tends to be reactive -- changes, quirks etc eventually get ported over to the wrapper compatibility-by-design is proactive (4) parallel development of 2 ontologies means doubled effort and means twice(roughly) more bugs in each of ontologies. (5) compatibility-by-wrapper encourages of divergence by decoupling onto development proceses from each other

Nepomuk vs xesam different maturation pace which is also encouraged by compatibility-by-wrapper.

Basically this means that convergence is impossible in near and mid-term. Eco system builds up and causes stability requirements.

sparql is hard to implement (1)sparql and sql map very well. The most of features and all common features of sparql have 100% mapping to sql (2)sparql syntax parsing is not strictly necessary. sparql is just one of many serializations of rdf graph templates. If there's an easier to parse serialization even if non-standard, it's very easy to map to sparql and sql for those who want it.

thus you can think of sparql as glorified or prettified sql depending on what subset of sparql you implement

(3) Xesam QL doesn't expose essential SQL functionality, namely JOINs

future-proofing the design, clean extensions vs hacks hacks generally mean unintended or unnatural use of features, "stretching" of usage area. Usually this comes at a cost of: flexibility further extensibility

duplication of functionality across clients which should have been implmented once in the backend (1) One stable and efficient vs a ton of buggy and slow imlementations. Also backend devs are more efficient at making the required functionality than client devs (2) Bandwidth&latency between storage and query parts of backend vs query and client. Fetching 1/2 of DB over dbus and doing data mining yourself is sloooow. (3) Grep-like backends belong to a lower-level framework... like a library which would implement a reasonably powerful QL over such backends. Otherwise you have to implement a part of this library in each xesam client application.

simple vs easy approach: asm is simple, C is easy or rather you can make something "simple" for developers but hard to users or reasonably hard for developers and "easy" for users.

xesam:Author linking up document authors, creators and senders from simple string with name case to complex sender address case. linking up