cassandra - What does the method setRepeatLastToken do in Astyanax? -
i'm looking @ data reading recipes , examples in astyanax documentation. of them (i.e. query rows callback) include
setrepeatlasttoken(false)
can explain used for? when should use it? looks defaults (true).
link javadoc: http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/query/allrowsquery.html#setrepeatlasttoken(boolean)
the source code com.netflix.astyanax.query.allrowsquery includes following comment:
* there few important implementation details need considered. * implementation assumes random partitioner used. consequently * keyrange query done using tokens , not row keys. done because * when using random partitioner tokens sorted while keys not. * however, because multiple keys potentially map same token each * incremental query cassandra repeat last token previous * response. ensure no keys skipped. have * important implications. first, last , potentially more (if * have same token) row keys previous response repeat. second, * if range of repeating tokens larger block size code * enter infinite loop. can mitigated selecting block size * large enough likelyhood of happening low. * also, if application can tolerate potential skipped row keys * call setrepeatlasttoken(false) turn off features.
i understand query done based on token range instead of key range. why rows potentially skipped if token wasn't repeated?
the source code comment pretty explains functionality of setrepeatlasttoken(boolean). here details:
according this post, cassandra uses md5 or murmurhash (depending on cassandra version) algorithm generating tokens keys. both of these algorithms fast can generate collisions (same token value different keys). because of that, there might multiple rows stored under same token (usually if data set large enough).
cassandra stores data on nodes based on tokens. when using random partitioner, data retrieval done in token order (not key order). makes sense because records read same node(s) in sequence , generate less traffic retrieving records random nodes in cluster.
when reading cassandra using astyanax paging, page (block) size may correspond middle of set of rows same token. when request next page comes, astyanax needs know whether start next token (an possibly miss rest of rows same token didn't fit last page) or repeat last token make sure rows last key read (but repeating 1 or possibly more rows previous page).
the code comment warns if page size small enough rows same token fit it, code may enter infinite loop if setrepeatlasttoken set true.
i hope helps else might wondering feature.
Comments
Post a Comment