Planet MySQL

Planet MySQL - https://planet.mysql.com
  1. Re-Slaving a Crashed MySQL Master Server in Semisynchronous Replication Setup

    In a MySQL 5.7 master-slave setup that uses the default semisynchronous replication setting for rpl_semi_sync_master_wait_point, a crash of the master and failover to the slave is considered to be lossless. However, when the crashed master comes back, you may find that it has transactions that are not present in the current master (which was previously a slave). This behavior may be puzzling, given that semisynchronous replication is supposed to be lossless, but this is actually an expected behavior in MySQL. Why exactly this happens is explained in full detail in the blog post by Jean-François Gagné (JF). Given such a scenario, MySQL documentation recommends that the crashed master must be discarded and should not be restarted. However, discarding a server like this is expensive and inefficient. In this blog post, we will explain an approach to detect and fix transactions on the crashed MySQL master server in a semisynchronous replication setup, and how to re-slave it back into your master-slave setup. Why Is It Important to Detect Extra Transactions on the Recovered Master? The extra transactions on the recovered master can manifest in two ways: 1. MySQL replication failures when the recovered master is re-slaved Typically, this happens when you have an auto-increment primary key. When the new MySQL master inserts a row into such a table, the replication will fail because the key already exists on the slave. Another scenario is when your app retries the transaction that had failed during master crash. On the recovered MySQL master (which is now a slave), this transaction would actually exist, and again, results in a replication error. Typically, the MySQL replication error would look like this: [ERROR] Slave SQL for channel '': Worker 5 failed executing transaction 'fd1ba8f0-cbee-11e8-b27f-000d3a0df42d:5938858' at master log mysql-bin.000030, end_log_pos 10262184; Error 'Duplicate entry '5018' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into test values(5018,2019,'item100')', Error_code: 1062 2. Silent inconsistency in data between the new MySQL master and slave (recovered master) In cases where the application does not retry the failed transaction and there are no primary key collisions in future, a replication error may not occur. As a result, the data inconsistency may go undetected. In both the cases above, either the high-availability or data integrity of your MySQL setup is impacted, which is why it’s so important to detect this condition as early as possible. How to Detect Extra Transactions on the Recovered MySQL Master We can detect if there are any extra transactions on the recovered master using the MySQL GTID (global transaction identifier) function: GTID_SUBSET(set1,set2): Given two sets of global transaction IDs set1 and set2, returns true if all GTIDs in set1 are also in set2. Returns false otherwise. Let’s use an example to understand this. GTID set on the recovered master whose UUID is: ‘54a63bc3-d01d-11e7-bf52-000d3af93e52’ is: '54a63bc3-d01d-11e7-bf52-000d3af93e52:1-9700,57956099-d01d-11e7-80bc-000d3af97c09:1-810’ The GTID set of the new master whose UUID is: ‘57956099-d01d-11e7-80bc-000d3af97c09’ is: '54a63bc3-d01d-11e7-bf52-000d3af93e52:1-9690,57956099-d01d-11e7-80bc-000d3af97c09:1-870’ Now, if we call the GTID_SUBSET function as GTID_SUBSET(GTID set of recovered master, GTID set of new master), the return value will be true, only if the recovered master does not have any extra transactions. In our example above, since the recovered master has extra transactions 9691 to 9700, the result of the above query is false. Re-Slaving a Crashed #MySQL Master Server in Semisynchronous Replication SetupClick To Tweet How to Re-Slave the Recovered MySQL Master That Has Extra Transactions Based on the above step, it is possible to know if the recovered master has extra transactions, and what these transactions are using the GTID function: GTID_SUBTRACT(GTID set of recovered master, GTID set of new master). It is also possible to extract these extra transactions from the binary logs and save them. It may be useful for your business team to later review these transactions to make sure we are not inadvertently losing any important business information, even though it was uncommitted. Once this is done, we need a way to get rid of these extra transactions so that the recovered master can be re-slaved without issues. One of the simplest ways to do this is to take a backup snapshot on the current master and restore the data onto your current slave. Remember that you need to retain the UUID of this server as before. After you’ve restored the data, the server can be re-slaved, and it will start replication from the point of the restored snapshot. You will soon have a healthy slave running again! The steps above are very tedious if you have to perform them manually, but ScaleGrid’s fully managed MySQL hosting service can automate the entire process for you without any intervention required. Here’s how it works: If your current master crashes, ScaleGrid automates the failover process and promotes a suitable slave as the new master. The old master is then recovered, and we automatically detect if there are extra transactions on it. If any are found, the MySQL deployment is put in a degraded state we use automated tools to pull out the extra transactions and save them for your review. Our support team can then restore the old master to a good state, and re-slave it back into your master-slave setup so that you will have a healthy deployment! Want to give it a try? Start a free 30-day trial to explore all the MySQL database management capabilities at ScaleGrid.
  2. MySQL Encryption: Talking About Keyrings

    It has been possible to enable Transparent Data Encryption (TDE) in Percona Server for MySQL/MySQL for a while now, but have you ever wondered how it works under the hood and what kind of implications TDE can have on your server instance? In this blog posts series, we are going to have a look at how TDE works internally. First, we talk about keyrings, as they are required for any encryption to work. Then we explore in detail how encryption in Percona Server for MySQL/MySQL works and what the extra encryption features are that Percona Server for MySQL provides. MySQL Keyrings Keyrings are plugins that allow a server to fetch/create/delete keys in a local file (keyring_file) or on a remote server (for example, HashiCorp Vault). All keys are cached locally inside the keyring’s cache to speed up fetching keys. They can be separated into two categories of plugins that use the following: Local resource as a backend for storing keys, like local file (we call this resource file-based keyring) Remote resource as a backend for storing keys, like Vault server (we call this resource server-based keyring) The separation is important because depending on the backend, keyrings behave a bit differently, not only when storing/fetching keys but also on startup. In the case of a file-based keyring, the keyring on startup loads the entire content of the keyring (i.e., key id, key user, key type, together with keys themselves) into the cache. In the case of server-based keyring (for instance, Vault server), the server loads only a list of the key ids and the key user on the startup so the startup is not slowed by retrieving all of the keys from the server. It is worth mentioning what information is stored in the keyring backend. The keys are lazy-loaded, which means when the first time a server requests a key, the keyring_vault asks the Vault server to send the key. The keyring caches the key in memory to ensure if, in the future, the server can use memory access instead of a TLS connection to the Vault server to retrieve the key. The record in keyring consist of the following: key id – An ID of the key, for instance: INNODBKey-764d382a-7324-11e9-ad8f-9cb6d0d5dc99-1 key type – The type of key, based on the encryption algorithm used, possible values are: “AES”, “RSA” or “DSA” key length – Length is measured in bytes, AES: 16, 24 or 32, RSA 128, 256, 512, and DSA 128, 256 or 384. user – Owner of the key. If this key is a system key, such as the Master Key, this field is empty. When the key is created with keyring_udf, this field is the owner of the key. key itself Each key is uniquely identified by pair: key_id, user. There are also differences when it comes to storing and deleting keys. The file-based keyring operation should be faster, and the operation is. You may assume the key storage is just a single write of a key to a file, but more tasks are involved. Before any file-based keyring modification, the keyring creates a backup file with the entire content of the keyring and places this backup file next to the keyring file. Let’s say your keyring file is called my_biggest_secrets; the backup is named my_biggest_secrets.backup. Next, the keyring modifies the cache to add or remove a key, and if this task is successful, it dumps (i.e., rewrites the entire content of a keyring file) from the cache into your keyring file. On rare occasions, such as a server crash, you can observe this backup file. The backup file is deleted by keyring next time the keyring is loaded (generally after the server restart). When storing or deleting a key, the server-based keyring must connect to the server and communicate a “send the key”/”request key deletion” from the server. Let’s get back to the speed of the server startup. Apart from the keyring itself impacting the startup time, there is also a matter of how many keys must be retrieved from the backend server on startup. Of course, this is especially important for server-based keyrings. On server startup, the server checks what key is needed to decrypt each encrypted table/tablespaces and fetches this key from the keyring. On a “clean” server with Master Key encryption, there should be one Master Key that must be fetched from the keyring. However, there can be more keys required, for instance, when a slave is re-created from master backup, etc. In those cases, it is good to consider the Master Key rotation. I will talk more about that in future blog posts, but I just wanted to outline here that a server that is using multiple Master Keys might startup a bit longer, primarily when server-based keyring is used. Now let’s talk some more on the keyring_file. When I was developing the keyring_file, the concern was also how to be sure that the keyring_file was not changed under the running server. In 5.7, the check is done based on file stats, which is not a perfect solution and this solution was replaced in 8.0 with SHA256 checksum. When keyring_file is first started, the file stats and checksum are calculated and remembered by the server, and the changes are only applied if those match. Of course, the checksum is updated as the file gets updated. We have covered lots of ground on keyrings so far. There is one more important topic, though, that is often forgotten or misunderstood – the per-server separation of keyrings, and why this is essential. What do I mean by that? I mean that each server (let’s say Percona Server) in a cluster should have a separate place on the Vault server where Percona Server should store its keys. Master Keys stored in the keyring have each Percona Server’s GUID embedded into their ids. Why is this important? Imagine you have one Vault Server with keys, and all of the Percona Servers in your cluster are using this one Vault server. The problem seems obvious – if all of the Percona Servers were using Master Keys without unique ids – for instance, id = 1, id = 2, etc. – all the Percona servers in the cluster would be using the same Master Key. What the GUID provides is this per-server separation. Why talk about the per-server separation of keyrings, since there is already a separation with the unique GUID per Percona server? Well, there is one more plugin, keyring_udf. With this plugin, a user of your server can store their own keys inside the Vault server. The problem arises when your user creates a key on, let’s say server1, and then attempts to create a key with the same identifier (key_id) on server2, like this: --server1: select keyring_key_store('ROB_1','AES',"123456789012345"); 1 --1 means success --server2: select keyring_key_store('ROB_1','AES',"543210987654321"); 1 Wait. What!? Since both servers use the same Vault server, should not the keyring_key_store fail on the server2? Interesting enough, if you try to do the same on just one server, you will get a failure: --server1: select keyring_key_store('ROB_1','AES',"123456789012345"); 1 select keyring_key_store('ROB_1','AES',"543210987654321"); 0 Right, ROB_1 already exists. Let’s discuss the second example first. As we discussed earlier – the keyring_vault or any other keyring plugin is caching all of the key ids in memory. So after the new key, ROB_1 is added on server 1 and apart from sending this key to Vault, the key is also added to the keyring’s cache. Now, when we try to add the same key for the second time, keyring_vault checks if this key already exists in the cache and will error out. The story is different in the first example. Keyring on server1 has its own cache of the keys stored on the Vault server, and server2 has its own cache. After ROB_1 is added to the keyring’s cache on server1 and Vault server, the keyring’s cache on server2 is out of sync. The cache on server2 does not have the ROB_1 key; thus, writes to the keyring_key_store and writes ROB_1 to the Vault server which actually overrides (!) the previous value. Now the key ROB_1 on the Vault server is 543210987654321. Interesting enough, the Vault server does not block such actions and happily overrides the old value. Now we see why this per-server separation on the Vault server can be significant – in case you allow the use of keyring_udf, and also if you want to store keys in order in your Vault. How can we ensure this separation on the Vault server? There are two ways of separation on the Vault server. You can create mount points in the Vault server – a mount point per server, or you can use different paths inside the same mount point, with one path per server. It is best to explain those two approaches by examples. So let’s have a look at configuration files. First for mount point separation: --server1: vault_url = http://127.0.0.1:8200 secret_mount_point = server1_mount token = (...) vault_ca = (...) --server2: vault_url = http://127.0.0.1:8200 secret_mount_point = sever2_mount token = (...) vault_ca = (...) We can see that we have server1 is using different mount point than server2. In a path separation the config files would look like the following: --server1: vault_url = http://127.0.0.1:8200 secret_mount_point = mount_point/server1 token = (...) vault_ca = (...) --server2: vault_url = http://127.0.0.1:8200 secret_mount_point = mount_point/sever2 token = (...) vault_ca = (...) In this case, both servers are using the same secret mount point – the “mount_point,” but different paths. When you create the first secret on server1 in this path – the Vault server automatically creates a “server1” directory. The actions are the same for server2. When you remove the last secret in mount_point/server1 or mount_point/server2, then the Vault server removes these directories also. As we can see in case you use the path separation, you must create only one mount point and modify the configuration files to make servers use separate paths. The mount point can be created with an HTTP request. With CURL it’s: curl -L -H "X-Vault-Token: TOKEN" –cacert VAULT_CA --data '{"type":"generic"}' --request POST VAULT_URL/v1/sys/mounts/SECRET_MOUNT_POINT All of the fields (TOKEN, VAULT_CA, VAULT_URL, SECRET_MOUNT_POINT) correspond to the options from the keyring configuration file. Of course, you can also use the vault binary to do the same. The point is that mount point creation can be automated. I hope you will find this information helpful, and we will see each other in the next blog post of this series. Thanks, Robert
  3. NDB Parallel Query, part 4

    In this part we will discuss how NDB batch handling works. Query execution ofcomplex SQL queries means that more rows can be delivered than the receiver iscapable of receiving. This means that we need to create a data flow from theproducer where the data resides and the query executor in the MySQL Server.The MySQL Server uses a record where the storage engine have to copy the resultrow into the record. This means that the storage of batches of rows is takencare of by the storage engine.When NDB performs a range scan it will decide on the possible parallelism beforethe scan is started. The NDB API have to allocate enough memory to ensure thatwe have memory prepared to receive the rows as they arrive in a flow of resultrows from the data nodes. It is possible to set batch size of hundreds and eventhousands of rows for a query.The executor of the scan is the DBTC module in the TC threads. This module onlypasses message through and sends them to the proper place. There is no storageof result rows in DBTC. There is only one TC thread involved in one scan (rangescan or full table scan). The TC thread will decide on which modules that shouldhandle each individual fragment scan. The message to scan contains a set ofreferences to the memory available in the NDB API. This set of references is inturn distributed to the fragment scans. This means that these can send resultrows directly to the NDB API.When a fragment scan has completed sending rows for all memory references itcannot continue until the NDB API has processed these rows. The fragmentscan handled by the DBLQH module in the LDM threads will send informationto the DBTC module that it waits for a continue request. The DBTC module willensure that the NDB API knows that it should receive a set of rows as specified inthe response to the NDB API.As soon as the NDB API have processed the set of rows it will inform the DBTCmodule that it is now ready to receive more rows. Since there are multiple fragmentscans it is possible that rows have been continously received in the NDB API whileit was processing the rows received previously.As can be seen in the above description the fragment scans will not be activelyperforming the scans all the time. It would be possible to scan in the DBLQHmodule and store the result row locally there until the continue request arrives.This is not done currently, it would obviously increase the parallelism for aspecific scan, but at the same time it would also increase the overhead for thescan.When we execute the special scans that execute joins in NDB in the DBSPJ modulewe also have batches to handle. The NDB API will allocate memory for a set ofrows on each table, thus the total batch size is can become quite high. It ishowever limited to a maximum of 4096 rows per table.When DBSPJ concludes a batch towards the NDB API it will wait for the NDB API toprocess those rows. However other DBSPJ modules working on other parts of thequery can continue the join processing. Actually the NDB API has setup enoughmemory to receive 2 batch sets, this means that DBSPJ can continue on the next setof rows even before the NDB API has processed the rows. This is another reason whyQ12 can execute faster than Q6 although it has more work to perform.At the moment result rows are sent immediately from the DBLQH module as part ofthe fragment scans (or key lookups). This means that we will process rows in theNDB API that are not really needed to be handled there. It is not an inefficiencysince if not done by the NDB API the work has to be done by DBSPJ instead. Butwe can increase parallelism by handling this in DBSPJ.This possible increased parallelism comes from two things. First not sendingunnecessary rows to the NDB API means that we have to wait less time for theNDB API to process rows. Additionally by storing rows in the DBSPJ module wecan increase the parallelism by using more memory in the data nodes forquery processing.The conclusion here is that we have a number of wait states in the DBLQH modulewhile processing the scan waiting for the NDB API. We have similar wait statesin the join processing in the DBSPJ module waiting for the NDB API to processthe result rows from the join processing.We already have implemented batch handling that makes the query execution efficient.It is possible by storing result rows temporarily in DBLQH and in DBSPJ to improveparallelism in the query execution.Next part.........In the next part we will go through a bit more complex query, Q7 in TPC-H which isa 6-way join that uses a mix of scans and key lookups.The query is:SELECT        supp_nation,        cust_nation,        l_year,        SUM(volume) AS revenueFROM        (                SELECT                        n1.n_name AS supp_nation,                        n2.n_name AS cust_nation,                        extract(year FROM l_shipdate) as l_year,                        l_extendedprice * (1 - l_discount) AS volume                FROM                        supplier,                        lineitem,                        orders,                        customer,                        nation n1,                        nation n2                WHERE                        s_suppkey = l_suppkey                        AND o_orderkey = l_orderkey                        AND c_custkey = o_custkey                        AND s_nationkey = n1.n_nationkey                        AND c_nationkey = n2.n_nationkey                        AND (                                (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')                                OR (n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY')                        )                        AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31'        ) AS shippingGROUP BY        supp_nation,        cust_nation,        l_yearORDER BY        supp_nation,        cust_nation,        l_year;
  4. Tungsten Clustering Makes The 2020 DBTA Top Trending Products List

    We’re delighted to be able to share that Tungsten Clustering – our flagship product – is named in the DBTA 2020 List of Trend Setting Products! Congratulations to all the products and their teams that were named in the 2020 list. We have been at the forefront of the market need since 2004 with our solutions for platform agnostic, highly available, globally scaling, clustered MySQL databases that are driving businesses to the cloud (whether hybrid or not) today; and our software solutions are the expression of that. Tungsten Clustering allows enterprises running business-critical MySQL database applications to cost-effectively achieve continuous operations with commercial-grade high availability (HA), geographically redundant disaster recovery (DR) and global scaling.Tungsten Clustering makes it simple to: Create MySQL database clusters in the cloud or in your private data center Keep the data available even when systems fail Free you up to focus on your business and applications Its key benefits include: Continuous MySQL Operations Zero Downtime MySQL Geo-Scale MySQL Hybrid-Cloud and Multi-Cloud MySQL Intelligent MySQL Proxy Most Advanced MySQL Replication Full MySQL Support, No Application Changes Tungsten Clustering comes with industry-best, 24/7 MySQL support services to ensure continuous client operations. Our customers are leading SaaS, e-commerce, financial services, gaming and telco companies who rely on MySQL and Continuent to cost-effectively safeguard billions of dollars in annual revenue. They include Adobe, Carfax, CoreLogic, F-Secure, Garmin, Marketo, Modernizing Medicine, Motorola, NewVoiceMedia, RingCentral, Riot Games, VMware and more. To find out more, visit our Tungsten Clustering page or to contact us.
  5. Upgrading from MySQL 5.7 to 8.0 on Windows

    As you may know, I’m using MySQL exclusively on GNU/Linux. To be honest for me it’s almos 20 years that the year of Linux on the desktop happened. And I’m very happy with that. But this week-end, I got a comment on an previous post about upgrading to MySQL 8.0, asking how to proceed on Windows. And in fact, I had no idea ! So I spent some time to install a Windows VM and for the very first time, MySQL on Windows ! The goal was to describe how to upgrade from MySQL 5.7 to MySQL 8.0. So once MySQL 5.7 was installed (using MySQL Installer), I created some data using MySQL Shell: Of course I used latest MySQL Shell, 8.0.18 in this case. Don’t forget that if you are using MySQL Shell or MySQL Router, you must always use the latest 8.0 version even with MySQL 5.7. Before upgrading, I ran MySQL Upgrade Checker to be sure everything is compatible with MySQL 8.0: No problem, I’m good to go ! Don’t forget, now it’s the right time to perform a backup or a snapshot. The first step is to stop the MySQL Service: When done, you have to launch once again the MySQL Installer and use Modify your MySQL 5.7 Server product features to only leave the Sever data files checked: When all is proceeded, you return to the MySQL Installer Product Overview and you Add a new product: We select the latest MySQL 8.0 and there is no need to select Server data files, as we will upgrade our current data: When is done, please stop the new MySQL80 service and modify the my.ini (of MySQL 8.0!) that is located in System\ProgramData\MySQL\MySQL Server 8.0 by default: In that file, we modify the actual value of datadir and we point it to where was located the datadir of MySQL 5.7. In this example I only used the default values: And now it’s the most tricky part of the upgrade process, when you save this file, you must specified the ANSI encoding: If you don’t use the right encoding, when you will start the MySQL Service, in the error log you will have a message like this: [ERROR] Found option without preceding group in config file... at line 1 When saved, you can start the service. It will take some times as MySQL will proceed to the upgrade of the system tables and other files as you can see in the error log: When the service is running again, you can connect and you should have access to all the data that was in you previous MySQL 5.7: As you can see the test schema is there and of course we can check the content too: I hope this post answers the questions of those that were experiencing issues when trying to perform an in-place upgrade from MySQL 5.7 to MySQL 8.0 in Microsoft Windows.

Upcoming Events

Tue Jan 07 @ 7:00PM - 09:00PM
Monthly Meeting - TBA

Latest News

Visitors

We have 591 guests and no members online