07 July 2015

This Week in Elasticsearch and Apache Lucene - July 7, 2015

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Everything you want to know about Found’s hosted #Elasticsearch service - register here: https://t.co/zVQS10zBZV
— Found by Elastic (@foundsays) July 6, 2015

Elasticsearch Core

Packaging: Don't jarhell check system jars (#11979, 2.0.0)
Exceptions: Parameterized exception messages (#11981, 2.0.0)
Internal: really ban exitVM with security policy (#11982, 2.0.0)
Transport: Do not make the buffer skip while a stream is open (#11988, 2.0.0)
Mapping: Move short name access out of field type (#11977, 2.0.0)
Packaging: Don't add CWD to classpath when ES_CLASSPATH isn't set (#12001, 2.0.0)
Exceptions: Promote headers to first class citizens on exceptions (#12006, 2.0.0)
Status: Replacing sigar (#11995, 2.0.0)
Exceptions: Don't special-case on ElasticsearchWrapperException in toXContent (#12015, 2.0.0)
Stats: Remove sigar completely (#12010, 2.0.0)
ZenDiscovery: #11960 failed to remove eager reroute from node join (#12019, 2.0.0, 1.7.0)
Build: change groupIds & artifactIds (#12029, 2.0.0)
Build: Update maven-invoker-plugin to 2.0.0 (#11990, 2.0.0)
Mapping: Completely move doc values and fielddata settings to field types (#12014, 2.0.0)
Scroll: Append the shard top docs in such a way to prevent AOOBE (#11978, 2.0.0, 1.7.0, 1.6.1)
Build: Update to Apache Maven PMD Plugin 3.5 (#12065, 2.0.0)
Testing: Add integration tests for analysis plugins (#12070, 2.0.0)
Bulk: Use correct OpType on failure in BulkItemResponse (#12060, 2.0.0)
Exceptions: Ban java serialization (#11910, 2.0.0)
Search Templates: Adds API endpoint to render search templates as a response (#11570, 2.0.0)
Testing: Add simple plugins smoke tester (#11957, 2.0.0)
Internal: Cut over to writeable for TransportAddress (#11949, 2.0.0)
Allocation: Reroute after node join is processed (#11960, 2.0.0, 1.7.0)
Snapshot/Restore: Improve repository verification failure message (#11925, 2.0.0, 1.7.0, 1.6.1)
Packaging: Detect jar hell before installing a plugin (#11963, 2.0.0)
Recovery: Fix wrong reused file bytes in Recovery API reports (#11965, 2.0.0, 1.7.0, 1.6.1)
Aggregations: Makes ValueFormat and ValueFormatter never null (#11943, 2.0.0)
Packaging: Postrm script should not fail (#11678, 2.0.0, 1.6.1)
Aggregations: Makes SKIP Gap Policy work correctly for Bucket Script aggregation (#11970, 2.0.0)
Aggregations: Adds other bucket to filters aggregation (#11948, 2.0.0)
Aggregations: Adds a new GapPolicy NONE (#11951, 2.0.0)
Aggregations: Pipeline Aggregation to filter buckets based on a script (#11941, 2.0.0)
Exceptions: Carry on rest status if exceptions are not serializable (#11973, 2.0.0)
Mapping: Rename root mappers to "metadata" mappers (#11962, 2.0.0)
Percolator: Use time as field name in serialization (#11954, 2.0.0)
Aliases: Parse aliases at search time and never cache parsed alias filters (#11930, 2.0.0)
Build: include in plugins only needed jars (#11944, 2.0.0)
Internal: make sure ParseField is always used in combination with parse flags (#11859, 2.0.0)
Discovery: Don't join master nodes or accept join requests of old and too new nodes (#11972, 2.0.0)
Field stats: added index_constraint option (#11259, 2.0.0)
Update API: Upsert does not use ttl value (#8715, 2.0.0)
Dates: More strict parsing of ISO dates (#6227, 2.0.0)

Apache Lucene

Yes, Lucene does have a standard code style!
Why are tests sometimes inexplicably taking 2 hours?
Don't use the all-powerful setAccessible to access a private field in AttributeImpl.reflectWith
Add toposort to Lucene's automaton APIs
Now you can create an IndexWriter based on the point-in-time view of an already opened IndexReader
You no longer need write access when creating an FSDirectory if the directory already exists
IOUtils.spins now works on NVMe drives
EarlyTerminatingCollector now takes Sort directly, not SortingMergePolicy
Working with fully binary terms is now simpler
Can SynonymFilter generate a correct graph, using a separate graph flattening stage during indexing?
When exactly should the query cache safely make note of a query's usage?
Geo3D now computes circle planes more accurately
Phase out Filter the from suggest, facet, grouping, join and spatial modules, as well as PKIndexSplitter
FSTTester should not attempt to write dot files to the filesystem
Finite strings from an automaton should be an iterator not a fully populated, potentially massive Set
Specialize SpanPositionQueue by folding in the priority queue implementation
Can we speed up how BKDTree queries build their bitsets?
Add K nearest neighbor and simple naive bayes document classifiers to Lucene's classifier module
Should analysis factories really change the incoming Map of parameters?
StandardTokenizer.close behaves badly if it hits an exception
KNearestNeighborClassifier should use the class ranking not just its frequency in the top K results
Remove ToParentBlockJoinFieldComparator, but add a selector to pick which numeric or string value to sort by
Geo3D can now compute the arc distance from a point to a shape
Pros and cons of moving Geo3D to the sandbox module are discussed
GeoPointInBBoxQuery will continue computing the accepted ranges per segment
Add GeoPointDistanceQuery, and handle dateline crossing
Maybe we can fix invalid offsets from MappingCharFilter?
IndexWriter should only list files once on init
More reduction to FieldInfos heap usage
A likely JDK 9 bug sometimes causes an a false assert trip
A new CheckJoinIndex utility validates the index for block joins
New utility APIs will convert to/from geohashes
Access denied on windows if the checkout is symlink'd
Reduce the base RAM used by an FST by about 20%
DuplicateFilter is gone; use DiversifiedTopDocsCollector instead
Our ant build files now check that ivy is set up before running Apache Rat
SegmentInfo.toString now includes the sort key for the segment, if any
GermanStemmer had a typo (=+ instead of +=) that might cause incorrect stemming

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!

Elastic Search AI Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

By industry

By solution

Customer spotlight

Developers

Connect

Learn

Help

See what's happening at Elastic

This Week in Elasticsearch and Apache Lucene - July 7, 2015

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS