jira at apache
Feb 29, 2012, 5:49 AM
Post #8 of 11
[ https://issues.apache.org/jira/browse/SOLR-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219202#comment-13219202 ]
[jira] [Commented] (SOLR-3173) Database semantics - insert and update
[In reply to]
Per Steffensen commented on SOLR-3173:
bq. Optimistic locking as a superset to insert/update:
bq. What I already had in mind:
bq. - update only a specific version of the document by specifying it's exact version: _version_=12345
bq. - add a document only if it doesn't already exist (i.e. insert): _version_=-1
bq. - add a document regardless: don't specify a version
I still need a little time to evaluate to what extend _version_ can be used.
bq. So now that I look at it again, it looks like what's missing is your "UPDATE" semantics which would only replace the record if it already existed (a weaker form of the first case... any positive version is OK). But I really wonder how useful those semantics are (only add a doc if it's overwriting an existing doc, regardless of what version or what data it contains?)
bq. If there are usecases, we certainly should be able to do it.
The only-insert-if-not-exists is needed by us. The only-update-if-exists is mostly for consistency with what we know from RDBMS. Basically simulating what happens when you do the following in SQL and you have unique-constraint on id column. 1) will fail with a unique-key constraint error and 2) will not create the row/doc if it does not already exist.
1) INSERT INTO docs (id, column2, column3,...) VALUES (id-value, value2, value3,...)
2) UPDATE docs SET column2=value2, column3=value3, ... WHERE id=id-value
RDBMS people are used to a update operation that does no create a row/document if it has already been deleted. I will consider not making that feature - it is only there to give a consistent experince compared to what you are used to using RDBMS's, and actually seen from a distant perspective I think it is not logical with an "update"-operation that creates stuff if it does not exist (it is simple not logical from the word "update")
Right now I believe the solution will be that you will have the following URL-extentions
a) .../solr/.../update, the one already existing in Solr with unchanged semantics
b) .../solr/.../database/update, that updates if document already exists and does nothing if it does not already exists. And when versioning is activated (SOLR-3178) only updates if correct version is given - give VersionConflict error if document exists but version is not correct.
c) .../solr/.../database/insert, that creates a new document if document does not already exist. Fails with DocumentAlreadyExists error if document already exists.
The you can keep using Solr exactly as you are used to, and you can start using the new "database semantics" features if you want that. I might create a optinal config for DirectUpdateHandler2 where you can deactivate the stuff behind a). This can be used when you dont trust clients to use a) correctly in a setup where you want to ensure consistency under high concurrent load.
bq. As far as what \_version\_ is, it's new and used for solrcloud to handle reorders of updates to replicas (among other things).
bq. The leader shard decides what the version of a document should be (versions only increase), and forwards the doc with the version to the replicas.
bq. If a replica receives the same doc with a lower version, it knows that it can safely drop it because it already has a newer version.
Cool. I understand a little better now. So no (Wiki) documentation written yet?
> Database semantics - insert and update
> Key: SOLR-3173
> URL: https://issues.apache.org/jira/browse/SOLR-3173
> Project: Solr
> Issue Type: New Feature
> Components: update
> Affects Versions: 3.5
> Environment: All
> Reporter: Per Steffensen
> Assignee: Per Steffensen
> Labels: RDBMS, insert, nosql, uniqueKey, update
> Fix For: 4.0
> Original Estimate: 168h
> Remaining Estimate: 168h
> In order increase the ability of Solr to be used as a NoSql database (lots of concurrent inserts, updates, deletes and queries in the entire lifetime of the index) instead of just a search index (first: everything indexed (in one thread), after: only queries), I would like Solr to support the following features inspired by RDBMSs and other NoSql databases.
> * Given a solr-core with a schema containing a uniqueKey-field "uniqueField" and a document Dold, when trying to INSERT a new document Dnew where Dold.uniqueField is equal to Dnew.uniqueField, then I want a DocumentAlredyExists error. If no such document Dold exists I want Dnew indexed into the solr-core.
> * Given a solr-core with a schema containing a uniqueKey-field "uniqueField" and a document Dold, when trying to UPDATE a document Dnew where Dold.uniqueField is equal to Dnew.uniqueField I want Dold deleted from and Dnew added to the index (just as it is today).If no such document Dold exists I want nothing to happen (Dnew is not added to the index)
> The essence of this issue is to be able to state your intent (insert or update) and have slightly different semantics (from each other and the existing update) depending on you intent.
> The functionality provided by this issue is only really meaningfull when you run with "updateLog" activated.
> This issue might be solved more or less at the same time as SOLR-3178, and only one single SVN patch might be given to cover both issues.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene