How MyRocks Outperforms InnoDB on SSD Storage: A Performance Comparison

stalininvosubsho
Aug 19, 2023
7 min read

These performance and architecture improvements have also enabled Messenger to add new features much more easily. The migration to MyRocks has helped us to launch message content search on mobile, a frequently requested feature. It was hard to implement mobile message search using HBase. Because it is I/O bound, it would require an equivalent read-heavy job to build an index for search. By switching to MyRocks, Messenger has now directly adopted the established Facebook search infrastructure built on top of MySQL, and we now have more than sufficient headroom to allow users to search their conversations on mobile as well as on desktop.

A Look at MyRocks Performance

Download Zip

Matt Yonkovit, The HOSS at Percona, sits down with Nagavamsi (Vamsi) Ponnekanti, Software Engineer at Quora. During the show we dive into the details on how and why Quora moved from HBase to RocksDB via MyRocks. We also learn about the need to reduce latency and improve predictability in performance in large infrastructure and database systems. Vamsi highlights some of his favorite features and tools and gives us tips and tricks on database migrations.

Nagavamsi Ponnekanti:Yeah, so the read performance was one of the problems. So the machine learning related use case alchemy about which we wrote a blog post, a different blog post as well. So they were complaining about the performance issues, the read latencies, and stuff. So that was one thing. Then the other thing was, we were using both MySQL and HBase. Both of these are very complex and very different from each other. So practically speaking, no, there is no knowledge or tool sharing that can happen between these two. So like, no, nobody ever wrote it tool that works for both MySQL and HBase. Right. So yeah, so that was double the complexity when we had in your engineer, or even if you hire an experienced engineer, and they may not be familiar with both MySQL and HBase. So yeah, these were the two main reasons like the read performance, especially the P99. Performance, read performance, and the fact that MySQL and HBase are so different, and both are complex. Yeah.

Nagavamsi Ponnekanti:So I think the reason for that, one reason is that with my rocks, now, the way we are doing, we are handling these tables we write to a different instance than we read from so that then the bulk load is happening at high speed. Now, the read performance, hopefully is not affected by this bulk load. So, yeah, that I think, the higher performance

Matt Yonkovit:Right. Okay, and Vamsi let me finish with this. Any advice to other people who are going to be looking at rocks? Or looking at my rocks? Anything that you learn knowledge to pass along to them? You know, that would be useful?

I hope the project will at some point start looking into making RocksDB a viable option, as it seems like a match made in heaven and only a matter of time before RocksDB start grabbing more InnoDB marketshare - Just not at the moment due to the user acquired locks that seem to be scattered around all throughout the Matomo code base.

Most databases grow in size over time. The growth is not always fast enough to impact the performance of the database, but there are definitely cases where that happens. When it does, we often wonder what could be done to reduce that impact and how can we ensure smooth database operations when dealing with data on a large scale.

Another thing we have to keep in mind that we typically only care about the active dataset. Sure, you may have terabytes of data in your schema but if you have to access only last 5GB, this is actually quite a good situation. Sure, it still pose operational challenges, but performance-wise it should still be ok.

A: Amazon Aurora supports two kinds of replication: physical as implemented by Amazon (this is the default for replicas in the same region), and the regular asynchronous replication for cross-region replication. If you use the former, I cannot help you because this is a closed-source Amazon feature. You need to report a bug to Amazon. If you used the latter, this looks buggy too. According to my experience, it should not happen. With regular replication you need to check which transactions were applied (best if you use GTIDs, or at least the log-slave-updates option) and which were not. If you find a gap, report a bug at bugs.mysql.com.

Manyi: There are a number of interesting features in 8.0. CTE or Common Table Expression has been one of the most demanded SQL features. MySQL 8.0 will support both the WITH and WITH RECURSIVE clauses. A recursive CTE is quite useful for reproducing reports based on hierarchical data. For DBAs, Invisible Index should make life easier. They can mark an index invisible to the optimizer, check the performance and then decide to either drop it or keep it. On the performance side, we have improved the performance of table scans, range scans and similar queries by batching up records read from the storage engine into the server. We have significant work happening in the cost model area. In order to produce more optimal query plans, we have started the work on adding support for histograms, and for taking into account whether data already is in memory or needs to be read from disk.

Manyi: I like to speak and get feedback from MySQL users. Their input has a big impact on our roadmap. I also look forward to learning more about innovations by web-scale players like Facebook, Alibaba and others. I always feel more energized after talking to people who are passionate about MySQL and databases in general.

The countdown is on for the annual Percona Live Europe Open Source Database Conference! This year the conference will be taking place in the great city of Amsterdam October 3-5. This three-day conference will focus on the latest trends, news and best practices in the MySQL, MongoDB, PostgreSQL and other open source databases, while tackling subjects such as analytics, architecture and design, security, operations, scalability and performance. Percona Live provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs.

About Duplicate Keys, if I have a UNIQUE KEY on two columns, is it ok then to set a key for each of these columns also? Or should I only keep the unique key on the columns and get rid of regular key on each column also?As I mentioned during the talk, for composite index the leftmost prefix is used. For example, If you have a UNIQUE INDEX on columns A,B as (A,B), then this index is not used for lookup for the query below:

Can you find candidate missing indexes by looking at the slow query log?Yes, as I mentioned you can find unused indexes by enabling log_queries_not_using_indexes. It writes to slow_query_log. You can also enable the user_statistics feature which adds several information_schema tables, and you can find un-used indexes with the help of user_statistics. pt-index-usage is yet another tool from Percona toolkit for this purpose. Also, check this blogpost on this topic.

How to find the unused indexes? They also have an impact on performance.Unused indexes can be found with the help of the pt-index-usage tool from Percona toolkit as mentioned above. If you are using Percona Server, you can also use User Statistics feature. Check this blogpost from my colleague, which shows another technique to find unused indexes.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server and MariaDB). It acts as an intermediary for client requests seeking resources from the database. ProxySQL was created for DBAs by René Cannaò, as a means of solving complex replication topology issues.

Through our internal benchmarking and some user reports, we have found that with long term heavy write use TokuDB/PerconaFT performance can degrade significantly on large data files. Using smaller node sizes makes the problem worse (which is one of our performance tuning recommendations when you have faster storage). The problem manifests as low CPU utilization, a drop in overall TPS and high client response times during prolonged checkpointing.

To find a suitable hole to place node data, the current block allocator starts at the first block in the array. It iterates through the blocks looking for a hole between blocks that is large enough to hold the nodes data. Once we find a hole, we cut the space needed for the node out of the hole and the remainder is left as a hole for another block to possibly use later.

This linear search severely degrades the PerconaFT performance for very large and fragmented dictionary files. We have some solid evidence from the field that this does occur. We can see it via various profiling tools as a lot of time spent within block_allocator_strategy::first_fit. It is also quite easy to create a case by using very small node (block) sizes and small fanouts (forces the existence of more nodes, and thus more small holes). This fragmentation can and does cause all sorts of side effects as the search operation locks the entire structure within memory. It blocks nodes from translating their node/block IDs into file locations.

Using Apache Spark on top of the existing MySQL server(s) (without the need to export or even stream data to Spark or Hadoop), we can increase query performance more than ten times. Using multiple MySQL servers (replication or Percona XtraDB Cluster) gives us an additional performance increase for some queries. You can also use the Spark cache function to cache the whole MySQL query results table.

Now, this looks really good, but it can be better. With three nodes @ m4.2xlarge we will have 8*3 = 24 cores total (although they are shared between Spark and MySQL). We can expect 10x improvement, especially without a covered index.

The use case is the TPC-C benchmark but executed not on a high-end server but on a lower-spec virtual machine that is I/O limited like for example, with AWS EBS volumes. I decided to use a virtual machine with two CPU cores, four GB of memory, and storage limited to a maximum of 1000 IOPs of 16KB. The storage device has performance characteristics pretty similar to an AWS gp2 EBS volume of about 330 GB in size. I emulated these limits using the KVM iotune settings in my lab. 2ff7e9595c

Sarah Lane

How MyRocks Outperforms InnoDB on SSD Storage: A Performance Comparison

A Look at MyRocks Performance

Recent Posts

Comments