This Week in Elasticsearch and Apache Lucene - July 7, 2015
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Everything you want to know about Found’s hosted #Elasticsearch service - register here: https://t.co/zVQS10zBZV
— Found by Elastic (@foundsays) July 6, 2015
Elasticsearch Core
- Packaging: Don't jarhell check system jars (#11979, 2.0.0)
- Exceptions: Parameterized exception messages (#11981, 2.0.0)
- Internal: really ban
exitVM
with security policy (#11982, 2.0.0) - Transport: Do not make the buffer skip while a stream is open (#11988, 2.0.0)
- Mapping: Move short name access out of field type (#11977, 2.0.0)
- Packaging: Don't add CWD to classpath when
ES_CLASSPATH
isn't set (#12001, 2.0.0) - Exceptions: Promote headers to first class citizens on exceptions (#12006, 2.0.0)
- Status: Replacing sigar (#11995, 2.0.0)
- Exceptions: Don't special-case on
ElasticsearchWrapperException
in toXContent (#12015, 2.0.0) - Stats: Remove sigar completely (#12010, 2.0.0)
- ZenDiscovery: #11960 failed to remove eager reroute from node join (#12019, 2.0.0, 1.7.0)
- Build: change groupIds & artifactIds (#12029, 2.0.0)
- Build: Update
maven-invoker-plugin
to 2.0.0 (#11990, 2.0.0) - Mapping: Completely move doc values and fielddata settings to field types (#12014, 2.0.0)
- Scroll: Append the shard top docs in such a way to prevent
AOOBE
(#11978, 2.0.0, 1.7.0, 1.6.1) - Build: Update to Apache Maven PMD Plugin 3.5 (#12065, 2.0.0)
- Testing: Add integration tests for analysis plugins (#12070, 2.0.0)
- Bulk: Use correct
OpType
on failure in BulkItemResponse (#12060, 2.0.0) - Exceptions: Ban java serialization (#11910, 2.0.0)
- Search Templates: Adds API endpoint to render search templates as a response (#11570, 2.0.0)
- Testing: Add simple plugins smoke tester (#11957, 2.0.0)
- Internal: Cut over to writeable for TransportAddress (#11949, 2.0.0)
- Allocation: Reroute after node join is processed (#11960, 2.0.0, 1.7.0)
- Snapshot/Restore: Improve repository verification failure message (#11925, 2.0.0, 1.7.0, 1.6.1)
- Packaging: Detect jar hell before installing a plugin (#11963, 2.0.0)
- Recovery: Fix wrong reused file bytes in Recovery API reports (#11965, 2.0.0, 1.7.0, 1.6.1)
- Aggregations: Makes
ValueFormat
and ValueFormatter never null (#11943, 2.0.0) - Packaging:
Postrm
script should not fail (#11678, 2.0.0, 1.6.1) - Aggregations: Makes
SKIP
Gap Policy work correctly for Bucket Script aggregation (#11970, 2.0.0) - Aggregations: Adds
other
bucket to filters aggregation (#11948, 2.0.0) - Aggregations: Adds a new GapPolicy
NONE
(#11951, 2.0.0) - Aggregations: Pipeline Aggregation to filter buckets based on a script (#11941, 2.0.0)
- Exceptions: Carry on rest status if exceptions are not serializable (#11973, 2.0.0)
- Mapping: Rename
root
mappers to "metadata" mappers (#11962, 2.0.0) - Percolator: Use
time
as field name in serialization (#11954, 2.0.0) - Aliases: Parse aliases at search time and never cache parsed alias filters (#11930, 2.0.0)
- Build: include in plugins only needed jars (#11944, 2.0.0)
- Internal: make sure
ParseField
is always used in combination with parse flags (#11859, 2.0.0) - Discovery: Don't join master nodes or accept join requests of old and too new nodes (#11972, 2.0.0)
- Field stats: added
index_constraint
option (#11259, 2.0.0) - Update API: Upsert does not use ttl value (#8715, 2.0.0)
- Dates: More strict parsing of ISO dates (#6227, 2.0.0)
Apache Lucene
- Yes, Lucene does have a standard code style!
- Why are tests sometimes inexplicably taking 2 hours?
- Don't use the all-powerful
setAccessible
to access a private field inAttributeImpl.
reflectWith - Add toposort to Lucene's automaton APIs
- Now you can create an
IndexWriter
based on the point-in-time view of an already openedIndexReader
- You no longer need write access when creating an
FSDirectory
if the directory already exists IOUtils.spins
now works on NVMe drivesEarlyTerminatingCollector
now takesSort
directly, notSortingMergePolicy
- Working with fully binary terms is now simpler
- Can
SynonymFilter
generate a correct graph, using a separate graph flattening stage during indexing? - When exactly should the query cache safely make note of a query's usage?
- Geo3D now computes circle planes more accurately
- Phase out
Filter
the from suggest, facet, grouping, joinand spatial modules, as well as PKIndexSplitter
FSTTester
should not attempt to writedot
files to the filesystem- Finite strings from an automaton should be an iterator not a fully populated, potentially massive
Set
- Specialize
SpanPositionQueue
by folding in the priority queue implementation - Can we speed up how
BKDTree
queries build their bitsets? - Add K nearest neighbor and simple naive bayes document classifiers to Lucene's classifier module
- Should analysis factories really change the incoming Map of parameters?
StandardTokenizer.close
behaves badly if it hits an exception KNearestNeighborClassifier
should use the class ranking not just its frequency in the top K results - Remove
ToParentBlockJoinFieldC
, but add a selector to pick which numeric or string value to sort byomparator - Geo3D can now compute the arc distance from a point to a shape
- Pros and cons of moving Geo3D to the sandbox module are discussed
GeoPointInBBoxQuery
will continue computing the accepted ranges per segment- Add
GeoPointDistanceQuery
, and handle dateline crossing - Maybe we can fix invalid offsets from
MappingCharFilter
? IndexWriter
should only list files once on init- More reduction to
FieldInfos
heap usage - A likely JDK 9 bug sometimes causes an a false
assert
trip - A new
CheckJoinIndex
utility validates the index for block joins - New utility APIs will convert to/from geohashes
- Access denied on windows if the checkout is symlink'd
- Reduce the base RAM used by an FST by about 20%
DuplicateFilter
is gone; useDiversifiedTopDocsCollecto
insteadr - Our ant build files now check that ivy is set up before running Apache Rat
SegmentInfo.toString
now includes the sort key for the segment, if anyGermanStemmer
had a typo (=+
instead of+=
) that might cause incorrect stemming
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!