Elasticsearch Refresh API vs Flush API
Refresh api and flush api may seem producing same results, but they have differences about performance and persistency.
As we talked in the previous post, each elasticsearch shard is a Lucene index. To understand refresh api and flush api in elasticsearch, we need to talk about two lucene command, reopen and commit.
Reopen command, like its name refer, re open an index when it is called. After indexing some documents by calling reopen command, you can make documents searchable. This command creates a new segment in memory and write documents from memory buffer into new segment. But still these operations are done in memory. Documents will be lost if some problems occurs in server.
Commit command, on the other hand, merges documents from different segments and write them into the disk. So, documents become persistent. But these operations are source consuming and expensive.
Elasticsearch Refresh API calls lucene’s reopen command and makes documents searchable. Refresh api call creates new segments because of lucene nature.
Like we mentioned above, refresh api call is a reopen operation and it is in memory. Data will be lost on server failures. To prevent this, elasticsearch writes data into translog (one translog per shard) at the same time with writing into memory buffer. Translog datas are fsynced to disk. So they provide persistency even if documents disappear in memory.
With some intervals or if translog file is big enough, data on the the translog committed to lucene index and becomes persistent. Implicitly, flush api is called and a commit is done. You can call flush api to make a lucene commit explicitly.
In short, refresh api just makes documents searchable in memory. But flush api makes lucene commit and make documents persistent. Flush api must be called carefully because of its expensive operations.
Leave a Reply