Okay oh we we are live ah yes i guess okay can you see something can you hear something i don’t know it looks good it looks good okay oh we it sounds good as well um so okay cool i oh i didn’t meant to show this screen with this deprecated ip address Of course i want to advertise the domain i fully control and intend to keep paying for that i will always redirect to the current um host why do i even mention it that extensively this time is um because wait i really need my obs on the side here And the reason that i mentioned it that extensively is because the server is currently running on a different ip since a few days i did a world back up and we had a big ass down time of i guess two days or something like that And one night so it was it was crazy um yeah so i i actually don’t i don’t want to show that so the thing is i’m going to install a third screen again and i was i was thinking about doing it before the stream but then i was like why you know And that’s why i’m doing it right now so okay can’t have space yet oh where did i put my screen oh that’s so inconvenient that i put it there yeah whatever okay please don’t die almost killed myself on stream oh by the way i i killed myself off stream In game that’s too bad isn’t it i was something’s really hot here i was messing around with um with my trap and last stream i didn’t finish it so i decided to place the tnt everywhere and then i tested it and of course it blew up so that’s where’s my here Yes and the reason i’m streaming now is because i found this correlation of when i’m streaming that the server has more players i don’t understand how how that’s the case but it seems to be so and i currently need more players for some performance testing since i moved the server again to My local machine i increased oh yeah because before i did the backup the server was always crashing when there were please don’t die oh yeah minecraft in full screen is probably not the wisest of ideas ah okay well that that was an easy fix it’s still configured correctly Okay so i can can show here the stream and obs like i can’t even remember how i did it the last time i could open the live chat as well now that i have three screens that might actually be feasible how can i can i resize this why is the Live chat so big okay so um yeah we can join yeah so that’s why we currently run on uh different hardware and i want to see how different it performs just for testing purposes and yeah this stream should probably be a duping stream so i even need a second device Because yeah i just blew up all my stuff that is that is actually bad um i wonder where my second account is um let me quickly log in there well quick oh my gosh where do i put my microphone oh oh i have to make sure no no that’s fine Okay uh because one of my laptops partition has messed up graphics drivers but that’s i don’t i don’t need graphic drivers if i’m not recording so it’s fine if i if i boot into my favorite partition Um yeah i i know like starting the stream without information or action um i would say it’s generally a bad idea for like viewers turning off if it’s slow so uh it’s starting so boring especially if it’s upload afterwards but i don’t know it it doesn’t seem to Perform better when i try hard to make these streams um efficient so to say efficient might be the wrong word but you know what i mean that’s too old isn’t it uh what version are we on one second seems like i got bamboozled okay i think we are at 1.16.4 Um is that a breaking version wait i don’t know like whatever let’s just launch this no no no no wrong account oh my gosh you’re currently not logged in okay so how interesting is the is the stream right now what do you guys say oh i can have actually zero viewers That means all the time i had one viewer it was not myself but an actual viewer oh and i was always talking like nobody’s watching anyways who are that one viewer that like got offended probably okay that’s a breaking version so i have to update my game How do i even update my game edit instance version change version or what no it is wrong here change version to so the isn’t that the patch um if like major minor patch and patches shouldn’t be um protocol breaking isn’t that like a common scheme so how is 1.16 not compatible with 1.16 Crazy ass okay so where am i oh i am at my duke base it’s kind of up no it’s wonderful i will go there that’s such a nice place to to do the things we also get some charging for my heaping device i mean it’s still in the 200 by 200 Radius isn’t it so oh no it’s in the 1000 by 1000 radius i think so it might be a bit close but we’ll see we’ll see okay then let’s head there where am i even where am i anything that looks familiar well that’s definitely the way i’m coming from As you can tell um how do i how do i find i mean i i knew coordinates at some point um well i could also like check coordinates i am at 300 800 please don’t write my dupes dash wait that’s wrong direction like this what is making these sounds some of my devices Ah it’s my tube machine that’s fine okay so okay yeah i think from here on and now where it is don’t i yeah yeah that looks familiar oh or was that your ah no it’s the current one the old one was like right next to it and then it got found And i moved a few blocks something like that um okay well at least the chests are still here so yay um 15 7 6 that’s all not very quality chakras nothing i want to my inventory right now okay so ah i need i need another mouse Which one did i use but which one is this should i use the same twice okay can i plug it in here as well awesome then i can wire this up like that and can i move yes i can okay cool cool cool so for everybody who is interested In how to dupe in 1.6 teen 18.40 what what dot four still don’t know the conversion uh it’s not displayed anywhere i know then um yeah watch closely since i’m about to do some magic oh that’s why it looks so up because the floor was full of sharks wasn’t it Uh okay so what do you want to do um those are nice kid starter um so i’m missing the kickstarter didn’t oh i think i’ve moved more sharkas out there than the server crashed and that’s why my e-chest is so full so i only miss these two okay that’s nice that’s nice Then let’s start with this baby then we go to the second machine we hop on uh we hop into the llama then i go over the disconnect button and on my second account right so i mounted on the llama and i’m about to disconnect now i align both of my mice Next to each other so i have the the right click and the left click like in the middle so like so i can press left click on the um One account and right click on you on account with the least amount of possible distance i i don’t know if that makes any sense but now i can smash both at the same time and i make the move that you used to like pick out people’s eyes You could also call it like the peace sign or something like that and then i smash it onto the two mice and as you can see since with this account i picked up the the shaka while the other one was disconnecting and if i if i was lucky now um then the timing Was on my site as it was cool so now we have it stood in here and in my inventory so that’s nice we can refill that one we still have this one in here right cool and now we do it again and since it’s now timing based um it’s a bit lucky even With my i would say pretty solid set up where like can click both of them with one with one move uh i use like two two devices two mice and so on but as you can see it doesn’t work all the time and um that’s bad i i have no idea how This works with one device i don’t think you can like maybe you can emulate two mouse clicks on one machine or something like that it might be even easier if you have the correct tags installed that manage two clients at the same time but it was easier at some point for sure Okay and that’s how today’s session is probably going to go i will um quickly search for a video that i wow that’s a little bit smaller and i’m going to watch because this is going to be boring um smart pointers and do we want to watch another crust of rust these are so I want something more easy on on the on the ears what about ownership closures and threats no let’s check out codingtech the rip of creative commons tech channel um no nothing interesting here do i have something in my watch later internet censorship in the catalan referendum Hmm boot kits for apple macos nothing that really peaks my interest that is that’s too bad there was this one guy who gave me permission to watch his videos i didn’t know if i wanted to watch his videos here that one what was he talking about opening old wounds why uber engineering switched From postgres to mysql so that guy gave me permission or gave another of my accounts permission is there is there a hole in the roof what oh that’s bad gotta fix that i don’t wanna get blown up here also is there enough light i can’t tell i’m scared of creepers and Yeah that’s probably enough light i mean oh my gosh why why is it so okay what man it’s so um griefed okay now now i should be safe okay cool um let’s see and what does he talk about oh row yeah let’s let’s watch this one uh Uh opening old wounds uh i can’t pronounce it i’m sorry why uh uber engineering switched from postgres to mysql a video from 2020 uh it has 6000 views from the hussein massa channel and i pronounced that wrong as always cool so let me quickly copy that go over here um then we Set that as the channel we watch talks from and we set the title to the title opening words not words oh my gosh i should stop talking cool and then we set the url to that if this is now i’m gonna rage pop up is my daddy Oh my gosh we have a 2b2t player here awesome um okay so now i should have fixed the stream title i wish i could resize it that chat is not so big as i can do it like that so i don’t see chat at all that’s I usually do it oh i can probably also zoom yeah that’s juicy okay okay cool wonderful so let’s start from the beginning maybe what is going on guys my name is hussain and insane um an old but gold article uh one second let me get something to drink okay Let’s go uber engineering switched from postgres to mysql and this article was published on july 26 2016 and this article explains why uber moved from postgres to mysql back in the days i remember that this article got a lot of backlash from the postgres community and actually the Whole database community uh to be honest because of of how the language used in this article severely currently supposed because as if it’s a bad database right they don’t even mention that thing that they say hey by the way guys this is just didn’t work for us It doesn’t mean it won’t work for you so that’s that’s was the uh that was the main reason this article was heavily pesticided i’m gonna reference you know this article and the hacker news um threats that explain that they’re just gonna have a lot of discussions as Some discussions go into the deep some discussions uh kind of pulling the flaws of this article but what i want to do in this video slash podcast is going to go through this article and through the main pain point that uber had and then discuss them give you my personal opinion Whether i think uh uber moving from post did they have to move from postgres to mexico or not all of that stuff how about we jump into it guys so guys um first they explained that their architecture right here they have the money back in application written in python That use postgres for data persistence and they’re moving that again this is 2016 thanksgiving and change but they’re moving to a micro services architecture and surprisingly to a new system using schema-less it’s a noble database sharding layer built on top of my sql so we’re going to Talk about that a little bit that just that is a little bit of flag you might you might you might say you’re saying what why this schema list and my sequel that doesn’t make any sense right exactly that’s a lot of people confused oh okay why would you pick my sequel for scheme Of this it’s still broken i ate my life i don’t think cockroach tv was born back then but but fauna mango right anything but yeah again they have their own reasons uh and that’s another article but what i want to focus in here is the architecture of postcards as they Claim so here’s the for the people listening we’re reading now the architecture of postgres and i’m going to read the article the five most pain point that led on uber to move from why can’t i connect to the server what happened this is the article now we encountered many postgres Limitations are you kidding me inefficient architecture for rights the second one inefficient data replication the third one issue with table corruption issues with table correction issues non-issue there’s poor the fourth one poor replica mvcc multiversion concurrency control support what and the first one is the final one difficulty of creating two new releases I kind of agree with some of them because i use postgres and i know how painful stuff grateful for so i kind of related to that i i understand that this is a little bit easier process right now but nevertheless how about we i don’t Agree with all the points by the way but i’m just reading up to you i agree with some of them some of them is just to me preposterous so how about we jump into it so they look through the limitation and they decide to move to mysql because It solves most of these problems so how about we jump into one point and one point after another so the first point here that’s called the on disk format they are describing in this article in this section the the on disk format of postgres that implements a multi-version concurrency controller and we talked About it many times in this channel how the actual uh indexes are stored how secondary indexes are stored in how do they implement their multiversion concurrency control using the transaction id the tid right and then max id and min id and how you how a row becomes visible my transaction once i Go out of scope of the transaction i need to do a vacuum to clean up those uh rows that need no longer seen by any other transaction so that’s all comes down to the isolation and all that stuff that we talked about many times this channel so check out the Asset video here to learn about isolation acid atomicity i’m not going to explain it right here so that’s that’s what i explained here so what they are going through here is they have a table called users and they’re showing you how postgres works so we’re for the people listening in the podcast We’re looking at the table with four oh that’s awesome he has a podcast last birthday so first thing last year name no and then there’s an id is a number first name obviously last name uh stop recording how did it affect the stream i started a videos I started a recording when i wanted to oh my gosh so what i wanted to say is it’s awesome he seems to have a podcast which comes in really handy for um me while i play games when he well tries to make it accessible for people that Just listen i abuse youtube in that regard anyways i should be watching um listening uh podcast but i still glance over here and then but i’m not watching active enough so this might be a pretty good um how do you say uh is it compromise it’s a pretty good format that i can Easily consume and get the most out of it i think so um yeah maybe i think that guy actually changes his license in the future to creative commons if i can trust his comment response then um yeah that might be another channel we can binge watch in here Let’s see about that but for now i’m i’m not disappointed i mean he he didn’t uh he didn’t mess something up i’m not saying that’s the best i’ve ever watched and he’s just um talking about this article but that’s really nice because i find it somewhat interesting and i would Probably never read that article especially not on stream so um that’s that anyways um let’s continue and not start a recording uh there’s a string character and then bear theo and they show you how this is on disk there’s a a ct id which is which which is a transaction id That is stored that’s basically the tuple reference on the desk this is very very important so these tables are a b c d e f g and so on and so the primary key they have an index on the primary which is the id they have a secondary index on the first Last and birth year so they have indexes on all of them and again this is just an example they didn’t show us their with their architecture for security reasons probably so they don’t they don’t show us their schema nothing like that but from this example that tells me that They have a lot of indexes so pay attention to that so postgres their primary key and secondary key always point to the tuple id which is the physical representation on disk right and here’s here’s here’s how postgres works so if you now go ahead and update a row Any row in this table what we do is we essentially insult insert that duplicate row e this tid right and now that we have a new id we need to point the indexes the secondary indexes and pretty much everything that uses this tuple id to the new representation right that takes A finite amount of time finite amount of i find what i want to work for postcards to do right because everything points directly to the desk just like my i uh my iso isa and and my sequel that’s exactly the same architecture where everything pointed directly to the disk and you might say What’s bad about this there’s good and bad the bad thing is what they are explaining explaining is this hey the moment we touch any role i i have to update all the indexes including the primary key because now all these indexes have a new those entries have a new id that i need To pick up tuple id so i have to update that and that obviously takes a ripple effect it’s called they called it right amplification and the common slides right and this logical writer i updated a single field in a single row it results on five six seven right physical rights to disk Because you’re updating the secondary index the secondary and the second if you have a lot of indexes so you can get slower and slower and slower right so they’re bear with me here right i’m just explaining their point now so they they go through all of that exactly what i said And that as a result slows things down because first of all rights are not just right or not slow per se because if you do flush right you have a lot of right to do and then you do all of them at once but the side effect of the rights We’re going to explain it in a minute it’s like one single thing translates to a lot of physical rights this index is indexed now write a headlock which is which is something we’re going to explain in this article a lot it’s large when you when you want to apply these changes So that’s the first thing on this if they go through the on disk representation of course sam then they explained already surprising somehow replication and replication here guys uh as i discussed the right ahead log is basically if i do an insert if i do an update this statement is translated into physical Changes okay go to this block and change this location and replace this value with this value right or go to this index and change this value to this value go to this index on this position changes these changes are written in the right ahead log as actual disk changes Okay so this is a very very very important thing to know now this is the right headlock we have this and and also direct ahead log is has its its own structure right it’s somewhere else and it’s being maintained so that’s also the right headlock also has good physical representation on disk When you translate into an ssd right so there is a last blood going on so now when you come to replication which we talked about right here guys and also discussed in my course introduction to data engineering the idea of having a primary database accept the rights and standby Replication for reads that that you can back into increase you need to push these changes and the way you push them is you push the right ahead log which is a very consistent thing down the standby databases what up to date did i just do a deprecated kit oh no it still has Diamond tools in there hmm okay that’s that’s bad okay let me quickly get rid of some trash okay i need some kids kids are always good ah that was too early way too early no it wasn’t well then i don’t understand how this game works this was perfect Okay it was well i don’t know um okay so efficiency we want the non-sig touch one and put it in here here rausch oh my gosh a german joke Okay is there anything else we can improve i guess otherwise it’s it’s a top-notch kit maybe more tnt less totems i usually don’t die that often let’s let’s be honest i’m i’m pro so um i usually need more team tea Or slime hmm what do we put in here slime or tnt or um a wood or like itches nah um i i don’t know what what to put in here to be honest okay let’s let’s put in some i actually need a lot of slime i would love to trade oh my gosh I was like thinking about how can i use the slot efficiently i want iron and some other block on there because i don’t need that much of an iron so i was thinking about using a c plus plus union i’m such a nerd i like it literally blinked in my brain For one second oh awesome i want two different types in one sort of memory let’s use a union oh my this is i i don’t know why i’m telling this is somehow embarrassing um whatever okay and brain dead but it would be awesome to have like half of the Slot used as um iron and half of the slot used as some something else uh but do you do you know if i is there a way maybe can i transport the iron less efficient and ship something else with uh something that you can craft back to iron something like um these are These anvils wait can you can you craft and we’ll expect what am i even doing here but um no you can’t it would be funny if i had anvils in my chest anyways uh so that doesn’t work is there anything else other than iron blocks that can be reverted To iron anything that drops iron um nah i don’t think so so too bad i could instead of blocks used the ingots so i don’t have to craft hmm more tnt or more slime and also the same with the i don’t need so many blocks of redstone should i throw away more totems Or gapples or um what such a hard decision okay um that’s a better kit now for sure is it is it really good though i don’t know i see space for optimization but i don’t see what exactly um yeah so hmm yeah let’s go that’s the good one right Put the bad one in here and let’s um do a few more of it those simple and what what what what they explain here in the replication form and this is where kind of their point about the limitation of post cases you guys have this first of all you have this Right that look which is quite large why because a single update statement translating to multiple rights and these rights are made its way to the right ahead log so the word the writer headlock doesn’t have update the stable it’s not like statement based replication despite possibly support statement-based replication Through a third party i believe that mysql have both supported with both status statement-based replication and also right based replication there is again pros and cons for both so now when we try to apply that came back the dog was barking all right so the master database Pushes the wall changes down to the standby databases so they can get updated but you might say i was saying what if the standby is actually executing a query do they just do we just stop this query right i say i’m extremely queer in the standby to read something That happened to be deleted in the master and it’s being written directly so i stop there do i stop that query do i wait all these questions are gonna get answered in a minute and as a result they will shape their decision to watch my sequel i’m going to do this I’m going to explain it to you so now they have they we talked about the on-desk replica on-desk representation we talked about replication and now we’re going to talk about the consequences of postgres design that’s the third point here where the problems of postgres so let’s enjoy this The first problem is the right amplification and right amplification is apparently something it is an ssd where a single write that you think it’s logical translate to many many many physical writes especially in ssd ssd does its own thing so when you update versus insert ssd does a little bit different thing This is these love to insert new things you have to like to logically just insert new things and change create new pages ssd does not do well with updates because the goal of ssd is to have a page and flush it in order to update an existing page you have to Invalidate that existing page you take it and then you copy it change it and then write it so there’s a little bit more work when it comes to an update versus and so which is faster so that’s just that’s that’s all the reason why google invented the level db database And then why facebook invented rocksdb on top of that i think that to to take advantage of ssds and they built a completely different structure called the log structured merge tree where it’s it’s optimized for inserts instead of updates right so everything is an insert almost in the log structure So so that’s that’s the idea of right amplification ssd now take that and amplify it at the postgres level i at the client i’m doing up a single update statement to my table and if i have like 700 indexes i just made 700 updates physical updates as a result of my Single okay do you need another starter kit is this one actually a good one this 700 updates also this is the level translate to many many physical amplified ssd updates because you go in the rights and pages and this is the fourth thing they have a limited shelf life so If you have uh if you have a limited shield like this you can only write so much i think there’s a number that varies between a disc and another but in general it’s essentially i think 12 000 times or something like that that’s most of them so so they explained This here i just summarized it to you right amplification is a problem for them so their ssd’s getting is getting um their life span of energy is getting lower and lower because of the right amplification because those guys have hundreds and hundreds of indexes why would you have this much indexes beats me Do you really query on all of them do you really agree on first name right last name that’s why adding indexes is great adding too much indexes is just a bad idea so that’s the right amplification problem the second problem they want to discuss here is is the replication Problem guys take the same thing that we did we did a single update that translated to lots of update to all the indexes because all of the index is pulling to the row directly so and the launch id changes so we have to make them aware of this row changes So all these indexes point to the row directly so these changes are just amplified now what is this case changes translate to what to a wall like a headlock hey update this physical and disk and this index and this second and then this this and by the way there is a row Here change this value to this what they complain about on their applications is this wall translates to a large big sized bandwidth when it comes to their to their master worker or standby replication and those are interstate they they have their replica replicas across states across different countries so they had to buy Expensive bandwidth to kind of transmit their wall changes from this replica to this replica and i believe they have also child grind child replication so take that into consideration so the wall changed as they go large the bandwidth becomes expensive because they are very large and you know they’re not making Small updates they’re making large updates which means they’re even larger so that’s that that’s the limitation problem here in case i’m going to read this this section for you guys so um so you can learn more about it in case when where postgres replication happens purely within a single data center the Replication bandwidth may not be a problem modern network equipment switches can handle our audio and many host providers offer free or cheap intra data center bank right if you are internally i can transfer one gig of wall sizes easily however when replication must happen between data centers issues can Quickly escalate for instance uber originally used a physical servers in collocation space i don’t know what what the heck is a co-location collocation space on the west coast for disaster recovery purposes we added service in the second east coast colocation space in this design we had a master postgres instance plus replicas In western data center and set of replication so that that kind of con the constraint you can see from east to west just just did did not scale for them right because of that you see one something i want to carry around with me these are much more efficient You see the pattern guys rights are big because they have a lot of indexes that’s where you should start why do you have this much indexes you might say hey i cannot live i have to have 350 indexes on all my fields because i query against them Well in this case i was like okay maybe that’s not a choice for you then but try to avoid that first but that’s that’s why that’s what i didn’t see and that’s why people are pissed it’s like wow can did you did you really exp didn’t you explain why do you guys have A lot of indexes can you explain why do you need i bet if you go into the actual architecture most things don’t need this much indexes as a result you will not translate to a huge uh right amplification consequences you will not have that because you’ll not have a lot of indexes To update right but well we’re not an uber so we don’t know their architecture but that might be a valid use case so let’s go to the data corruption this is the this is the most dumb section in this whole article i’ll save you some time What they say here is hey during the replication we postgres92 had a bug in it and our table were was corrupted as a result seriously seriously uber what software doesn’t have bugs you’re adding a bug as a result to move from pulse chris to my sequel like mysql is perfect That’s just odd that’s just to me i’m sorry that’s just odd so they said during during the replication process the replicas were not in sync for some reason and as a result when you query for for a unique value let’s say select start from users were id for four you Should get one right they were getting two they were getting the old retired row for us for some reason all right and that causes their application to to fall down and that fall apart so they had to add defensive programming and then to catch for this stuff but it’s a bug Holsters immediately if they notified possible steam they would immediately have fixed it and fix that bug immediately but that’s a good bug but i don’t see bugs as a reason to move from as a show stopper in my opinion so that’s they talk about that and they talk about here one section is The b3 rebalancing which is by the way by the way b3 rebalancing adds to the right amplification i just they don’t mention that but it’s just implied because a lot of people know that when you insert insert something subconscious to a row and you have a lot of indexes You keep updating those indexes naturally if you if your value touches that index right however as a result of inserting that might the b3 structure might need to rebalance itself and when it needs to rebalance itself it actually doesn’t update physical update to the tree and updates or updates not Inserts right so updates translate to what to actual ssd write amplifications because this is the do not like updates so that’s that’s another thing that can amplify the rights i’m talking if you go to the millions of rows right obviously right let’s move to the next one replica Mvcc all right replica mvcc or replica multiversion concurrency control it says poscas does not have true replica on vcc well why because that the fact that replicas apply wall updates directly because if you think about it postgres by defaulting and by device by default it takes the disk representation of the wall changes And that’s what get transmitted so it’s often higher bandwidth but it’s if you think about it it’s faster right the alternative is just do statement-based replication right where when instead of sending the results of the execution of the queries send the queries themselves like Hey i just did an insert i just did an update i just didn’t so the actual string of the statements these sequels did you just send them to the reflection this will be way slower right because yeah the bandwidth of chain transmitting these wall changes as as form of statement is Smaller than the actual physical changes that happen however applying them to the replica now you have to actually inserts are not straightforward inserts might be okay but what if you do an update for example an update could scan could touch the index actually does work so you did double the work technically Right because you did the work to execute the statement on the master you’re now doing the same work exactly and that that statement is expensive you got to take the same cost on the server and the the destination so there is a pros of calls for using both but They are complaining here that postgres wall update is just uh doesn’t give them mvcc support so let’s clear that so let’s say if i am if i’m in a replica standby and i’m executing a query and one of my wall changes affects that query that is being executed on the start button So okay i have a master i deleted let’s say i deleted a table that’s just a little bit harsh but let’s say deleted a few rows right and now on the understandable diamonds i’m actually querying those rows that’s being deleted on the master i am on a different replica So now i am pushing the master pushing the wall changes to the to the standby while that query that squaring those deleted roses being executed what should postgres do you you tell me as the viewer listener what should what do you think should happen here should the postgres immediately cancel the query Right and and write the changes or should the should the wall changes be paused until the query finishes if you think about there are no other choices right you have to pause it obviously you’re not posing all changes you’re only posing changes that affect running transaction and that’s another Thing to worry about how the heck do i know that that queer that being executed actually affects my world changes building databases is not easy guys look at all this complexity so they’re complaining here that you guys don’t have mv vcc support because what you’re doing is what posgus does effectively is essentially Having a timeout says hey we’re gonna we’re gonna block the wall changes for a given time on and they give you this timeout configurable if the query didn’t finish in this amount of time we’re sorry we’re going to cancel those changes we’re going to cancel that query that is actually querying its reading And while we’re applying we’re going to force applying the changes why because posters design favor eventual consistency over let’s say just reading queries right in this case so i’d rather be eventually consistent remember eventually this is eventually consistent as well so stop saying that nosql is the only database has evangelical systems every Database has it as long as between replicas right relational doesn’t in the same same same instance yeah that’s completely consistent but across replicas there is always this idea of eventual consistency so what post does is actually kills the transaction and they did not like that So guys let’s read this a little bit i i kind of don’t don’t agree with this statement right the design means that replicas can retain routinely lag seconds behind master obviously and therefore it is easy to write code that results in kill transaction what does that mean this problem might Not be apparent to the application developer writing code that obscure where the transactions are on m for instance say developer has some code that has to email receive to a user depending on how it’s written the code may implicitly have a database transaction that helds open until the email finishes to Sync that’s just a bad idea right you don’t have you don’t held you don’t hold a transaction open and you do stuff has nothing to do with the transaction so try to avoid that as much as possible so that’s just that’s just the best best practice while it’s always bad Form uh to let your code hold open database transaction while performing unrelated walking okay thanks thankfully they make sure that the reality is that most injury are not database expert and may not always understand this problem i have to disagree with this one again guys uh if You if you haven’t if you have seen it if you know me from this channel of the podcast you know that war as an engineer you have to take pride of your work and the thing that you interface with i believe that you have to understand what you’re communicating with So yeah engineers are not database expert but this does not qualify as a database expert this is just basic transaction management in my opinion right and and i believe engineers have to understand this right and engineers have to be not as understand as as me i don’t Like to work with anything that i don’t understand if it’s black box i don’t like to work with it before i pick a tool i have to understand fully how it actually works fully fully from zero to 100 if i’m if i’m working on it if i’m connecting with it If i’m interested with it it’s okay if i understand 80 70 of the tool right but again i’m not gonna understand every single thing in that case right but that’s just me you might have a different opinion postcode is upgrades so yeah that’s that’s actually um i tried people This approach is actually nice because if you run into issues well first of all you run into less issues because all the obvious bugs and pitfalls you know them if you know the tool you’re working with and if you run into the issues you fastly know how to debug them because you You understand what’s going on on the other hand um this requires a ton of time when like working with something or before working with something new you have to investigate all of the time to make sure that you can start working with it and that might delay a project so long That you even decide against using this tool at all and stay with the stuff you know because you have no time or it doesn’t seem worth it to invest so much research into a new tool that you don’t even know yet because you didn’t research it yet and you didn’t Use it yet because you don’t use stuff you don’t know so that that can lead to this thing that you don’t try out new stuff and you’re really good with the stuff you know and at some point you lag behind and that actually can work pretty well um if you have a Set of tools that does not change a lot and do not get replaced then you work fine with that and i would also say that i’m i would also like aim at that that i i also just feel uncomfortable if it works with something that i don’t understand and um So i also try to to figure out stuff um before i actually work with it and understand it but um i’m actually trying to fight this um habit of mine to be more open for for new technologies since what i noticed is that i know my tools that i know how they work And then i don’t see any reason to spend much time looking into stuff so i just don’t use them so what i’m trying to do nowadays is i try to just use them as i go if i find bugs i research those bugs and then i learn about the problems when they arise That is especially fine if if it doesn’t matter if the project blows up i know um if you run to serious issues and you’re in a production environment that’s up but um yeah i guess this way can also be used to to learn about the tools and then use them in Projects that matter and then you already know about them a lot more i think yeah you have to balance that it’s uh it’s maybe a bit um harsh to say i i don’t touch anything before i uh investigated it fully because that might block you from getting things done with the Right tool because you didn’t fully understand it but yeah i totally feel them i also have this some sort of anxiety or something like that that if i use something that i don’t fully understand and i’m like jumping from example to example and there’s this one thing Like bugging bugging me that i don’t understand why this is like this and what is it doing in the background that gives gives you just this like feeling that you are not in control and it just works because your example code you copied from somewhere works but if you want to change Something or if something goes wrong you’re you’re and that’s just intimidating to me but um yeah whatever i think well in in detail you you he is probably not that strict and he well it’s also kind of subjective on how how you define fully knowing a tool And how long it takes for you to to do that and stuff like that because if you go like fully fully fully then well that’s simply not feasible time wise i mean but he’s probably not going that drastic and um so his approach is probably just fine Just if you take him like how do you say i don’t want to say seriously but like um how’s it called like if you take the words as he said and that he always to a hundred percent understand wants to understand any tool he works with and he He claims to be really strict about it then um that is simply not feasible like i don’t know what he defines as tool but if you apply that mindset on the whole like ecosystem of software and he he probably is not even able to turn on his pc without dice bisecting or like Reverse engineering his bootloader because that’s running his machine and that’s some crucial tool he interferes with isn’t it and then like read all the code of his web browser and whatnot and read the documentation of the shell he’s using and so on right and i think sometimes That is that is overkill it’s it’s of course a nice habit but it it blocks you from getting things done sometimes this min uh mentality of understanding things sometimes it it just works if you if you do it um but i i’m sure he’s he’s uh he’s balancing that off but he’s he’s Saying it a bit how do you say it um he’s probably exaggerating how do you call it exaggerating it a bit um or i might be misunderstanding him and i would say i’m i’m similar to him in that regard but i’m i’m not going too too wide on it And um i just usually it’s more like a feeling based thing than an effect based thing when it comes to me i just have to feel comfortable with the tool that i think that i understand most of it that there or like at least the things that i Have to use to get my uh uh projects running that the things that i interfere with the tool i have to understand what they are doing if there’s something else that i could do better or something else that you could use or if i extend it that i don’t Know about then that’s fine because i’m not using that at least that the if the stuff that i’m using makes any sense to me that’s that’s fine um right i don’t even have to know how a sql update statement looks like syntactically i don’t even have to know that it exists If the only thing i do is inserting for example to be with a concrete example and i think what he is referring to is that before i even use uh sql or whatever i i have to understand what all these statements are and how they work in the background With these tuples and whatever and um yeah i i don’t have to do that to feel comfortable maybe i’m different in that regard if i understand it correctly and i don’t know why i’m talking so long about this but i just i just felt like talking some random as always Ignore what i say and check out the original videos to give them credit and feedback and not uh here right because uh i’m not the person like if you’re watching this on the youtube channel what are we silly goku or something like that then it is not me silly hoon the maintainer of Laser gogland talking about the databases databases it’s hussein nasser whatever and uh yeah check out his video it’s linked in description just wanted to make the show because i from time to time like essentially all of my comments the three i got uh this year Say like oh why do you talk about this tech and you could do that and like they write comments under my videos that sound like they were talking to me responding to the talks and that’s that’s a bit weird so maybe i i say that uh whatever let’s continue Many times i always didn’t find the right tutorial or it was so complicated that i gave up right and they kind of reiterate the same problem i had to agree with them 100 percent of this possible suffocate is really painful really painful i’ve been there I’ve been there from 9 30 two nine four nine four to nine five i then just gives up i i just rather recreate my databases from scratch after that obviously i’m running a test database here but but yeah i didn’t run a production database that i had to upgrade it but What i’m gonna do in this case is just obviously there is a way but apparently this way sometimes works sometimes it doesn’t so there’s a there is also the pg logical way of writing there’s there are some tools that allow you to do upgrades Right and um guys if you if you know any of that stuff you if you have ever upgraded postgres database smoothly let me know in the comment section below i’d love to know how to do it i i tried twice i believe and i gave up and says you know what This is not straightforward i joined and i didn’t have i wasn’t forced to do it so i took the easy route of rear creating my data okay the architecture of my sequel i don’t know we talked about any of you guys check out the video right here If you want to learn more about it but i’m really what they so they go now through their own disc representation compared to a postcard so another db or or just my sequel general mysql or android db in general that’s the right voice saying that they have the primary key and the Problem these blocks pointer to the row directly to the physical database on the scroll all the indices that you create points back to the primary key and that’s the powerful thing here for them because now if i update anything on on the on the row only the primary key Needs to be updated to know the new kind of raw id and even that right it’s a little bit different but i don’t have to to touch my secondary indexes right that being said guys they didn’t that’s not always true if you’re updating a field that has no index then Right you’re gonna not touch only the primary key but if you updated a field that has an index you gotta touch both so they didn’t mention that but yeah right because this very defensive architect article right my sequel is perfect yeah if you update that the actual field that has a secondary Index you have to update the second industry you just updated the value so you have to go to your index and change the tree so that includes this value right so yeah you touch a lot of fields right and if you touch a lot of fields oh my gosh Indexes right it’s just by design it’s less if you have a lot of indexes you have less changes in general right so as a result this conflict do obviously less less throw a wall changes because they don’t have as much changes logical to physical translation and now they talk about the uh rollback Mechanism here that my speech was not responsible for all that segments so instead of inserting a row in the heap itself uh would when you update a data a row and then postgres you insert a row in the heap itself in the table itself right bicycle and i know db does it differently It’s just they they copy their own to some other place called the undo the rollback segment the undo log right and they keep it all there and then based on that they point to that location in the rollback segments all right so so it’s a little bit different architecture So if you query now if you want the latest latest is always there so that’s the beautiful thing but based on your transaction if you are coming from the past you’re pointing in the past you want all the results you have to do the jump to go back To to get that all the time this jump doesn’t exist in postcards so queries that that are concurrent are fast on postgres they are technically slower oh my gosh my sequel because now you have to jump back and go through different places to do to do the query right and And vice versa so that they explain that hey secondary indexes point to the primary index and the primary next point to the disk this is for people listening on the podcast we’re listening uh we’re watching we’re looking at a picture of secondary index pointing to the primary index And then primary index is pointing to the disk and that’s just an excellent and then they claim that they say here that the replication function of mysql supports multiple replication mode statement based and wall changes and the moment you if you implement if you implement any of these one if you Implement statement based replication you have true mvc support because now the statement that the world changes that coming that is coming to you from the master to the standby is just another right to consider it another transaction trying to be executed so it will have truly True mvcc support in that case it will not be blocking right because you can technically query and write at the same time and now as a result you can implement the same exact thing that you’re doing right because you have logical view of what is changing as a result the Database is aware of the chain it can implement in vcc at the higher level right even through replication postgres does support that there is a third party that you can install and does exactly that you can do that it’s just they they just didn’t mention that oh yeah guys this is An old article so things can change obviously right in my circle uh and they they say that oh by the way even though the wall says the wall sizes are so small because we’re changing uh we’re changing very few things here they go through all of that stuff I’m not going to go through that but that’s essentially their advantage they go through another advantage here of my sequel saying that buffer pool the bar for pool is the caching mechanism in postgres compared to buffer pool is the caching mechanism in in my sequel compared to the caching mechanism phosphorus which is Which is basically the rss memory right and uh they’re explaining the difference here they they they claim that postgres using it uses a different uh operator operating system calls like they’re using two calls instead of one i don’t know much about that to be honest i’m not an expert in operating systems But a lot of people say that yeah you have to use a one call to seek and read at the same time instead of seeking and reading i don’t know maybe oscars actually changes a lot of people here listening and watching this channel some some people actually Are experts in this thing and that might correct that part but i’m not aware of that as a result so i can’t comment more much on that part there is then uh the nodb storage engine implements the least recently used uh buffer pool and which you can apparently control i’m surprised that You cannot control the cache size and postgres i need to read more about that a little bit but that’s another thing they they say that there’s another advantage in my sequel then another thing it says connection handling both in my sequel there’s a thread pair connection htcp connection to You open to my sequel is a thread on the server side however poscas it’s a actual process so technically now they they claim obviously in them and a thread is cheaper to spit off than a process i read i read that this is no longer true because the process Answer is almost identical now but could be back in the days could be that that was true but now if you think about it to scale 10 000 connections right now if you think about it opening opening a lot of tcp connections is just a bad idea so that’s that’s why we have The idea of connection pooling right we built our application so they use a pool reserve a pool reverse reserve a connection from the pool execute that transaction and then return it to the pull right and if you are you’re doing a single atomic trans uh statement that executed You can just execute on the pull directory say hey pick any poll any instance in the pool execute and then return return it immediately this reserve and release is also back to their queries if they have a queries that span three four five seven minutes and again nothing wrong with a query that That transaction stands long if you’re actually doing all database works some some sometimes actually i’ve seen transaction that takes 30 minutes just because it does a lot of work it changes a lot and these changes has to be atomic right yeah you can argue that you can break it even then You can you have to break these transactions into smaller and smaller smaller small small pieces so that each piece can be executed in its own atomic weather right so you can minimize the transaction side so this this also results in if you have a long running transactions then you have to Really think about how do the reservation and connection polling works right so the number of connection right think about it so that if no if a client’s not it’s not using a connection then don’t let them open a connection and just have it open use connection polling and they say they They use a i believe pg bouncer is that what they’re using some some service that actually does that that connection pulling but a lot of applications do it even if you don’t you can build your own layer on top and i showed the connection pooling on posters many times In this channel right that’s the idea guys and hopefully hopefully in the future and this we’re we’re at the end of the article obviously guys right we’re gonna get the article but hopefully when it comes to connection polling i really hope that quick as a protocol and mask i believe that they’re just Working on a new protocol right now it’s called mask that will allow you to kind of stream multiple to open multiple screens on a given tcp connection or udb connection in case of quick that represents your your database connection so that if if my sequel or post is supported quick And i don’t see a reason why not then the client can open a single and remember the client is always a good server or something like that right so open a single connection and have up to 200 even more than that streams concurrently in a single tcp version the Only track here is the database has to understand the idea of stream so that’s a lot of work but i believe it’s going to be really lucrative for a database to implement a protocol like that just like i don’t really need tcp anymore right a single tcp it’s just a wasteful thing To have a single tcp connection for a given client right or connection pooling this has to go away and we have to move to a model where we multiplex queries in a single tcp connection using this protocol right whether it’s even if they implemented their own they Don’t have to use quick you just implement your own protocol that supports multiplexing through multiplexing so that every request every session every channel has its own logical representation in that tcp connection that you open so this you don’t have to open many you can actually just have to open one Few of them and each one of them has basically some limit obviously that doesn’t come with for free because now you just increase the cpu size at the back end and the front end because now you have to assemble these channels and streams that’s the problem with hdb2 and quick People start uh lucas perdue and and what’s his name chris wood and people working on the quick protocol they’re trying to solve this problem with the cpu usage right cpu usage now you have you’re just not working with just stream of content coming from tcp socket no you Have to actually look at the data and then arrange the package so they are in logical streams or channels and then then deliver it to the app so the operating system or the application wherever this thing lives doing extra work so again i’m sorry about that segway but I want to discuss that a little bit i think that’s just an idea that is just great conclusion obviously they say hey possibly serve does well in the early days of uber but we ran into significant problems scaling prospects with our growth today we have some legacy postcards instances But the bulk of our databases are either on top of mysql typically using our schema list layer that’s another point you have now schema less you have schema less than using my sequel maybe there is something i’m missing here but it does not seem natural to me A lot of people use this uh postgres as a schema list like where they put a hunk of jason in a single field as jason b and they they they work with that but maybe that that’s just the way forward because they have a lot of fields And they have a lot of indexes on those fields maybe that’s the way to go who knows right again guys what do you think what do you think about all that stuff let me know in the comment section below i’m going to see on the next one hope you enjoyed this Video give it a like if you do and share it with your friends i’m going to see you all in the next one thank you ivan a staff engineer in uber engineering this is a great article again yeah and uh things that things has been changing a lot in the the uber ward Well this is again this is a this is a historical article that goes in the years and years okay what does that mean uh do i end the stream now i i feel like continue uh continuing um i guess i’ll end the stream and maybe start another one again We see so that’s it for now Video Information
This video, titled ‘Minecraft Anarchy – Why Uber Engineering Switched from Postgres to MySQL’, was uploaded by ZillyGurke on 2020-11-18 23:25:13. It has garnered 27 views and 1 likes. The duration of the video is 01:29:06 or 5346 seconds.
Lasergurkenland anarchy server domain: lgl.zillyhuhn.com
Small pure vanilla minecraft server. No plugins. No admins. No rules. Chilled anarchy server with stable tps and no queue. No world resets and stable uptime. The server will stay online for at least a few years.
Hussein Nasser talks watched in this video: Opening Old Wounds – Why Uber Engineering Switched from Postgres to MySQL