Minecraft JARVIS-1: Memory & Decision Making

Video Information

Hi everyone and welcome to today’s session today we’ll be talking about the latest LM based model to solve the Minecraft environment the name is Javis one the open world multitask agent with memory augmented multimodel language models um so actually sounds very big the main thing is compared to previous

Models like Voyager and go in the Minecraft what they did was that they use images as input and they also saw this image in the memory so that you can retrieve those images so let’s let’s take a look at how how they solve it so like for example

Let’s take a look at this uh how to craft a wooden pickaxe so let’s see so you can see as the video runs the top left task is the main task is wooden pickax um the environment input is this this entire RGB pixel of the entire screen nothing has changed

Okay including including the health bars and everything all these will be the inputs the position will be given separately and then there’s a goal you can see this there’s this sub goals planning so like now they want to go to crafting table and then after that craft the Stak and

So on so you can see all these Mouse actions and everything okay based on this goal there will be a problem given to the controller like for example craft wooden pickax and so on and the controller will need to interpret the environment inputs and map it down to

Mouse actions or keyboard actions so this is one I mean we can take a look at another one like so you can see again this is the goal there’s this goal over here that is basically based on the planner you break down the task of the wooden pickaxe into separate manageable

Goals like over here the goal is to get the Birch lock then create the Birch planks all right and then create the crafting table so so this is just an example that like you can see this idea of creating the wooden pickaxe in different environments you have

Different ways of creating it and Javis one because of the way it’s trained I’ll explain how it’s trained later is able to achieve some diversity here and still fulfill the goal okay regardless of the starting environment that is in so this is Minecraft for you all right I’m going

To go to my slides now okay maybe give me a thumbs up you can see the slides this is the slides for Javis one with this verdict over here anyone can you all see the slides all right great okay thanks David so over here I would like to say that in

Terms of what I feel about Java one I think it’s very cool that they use the image input because as you already know the image input is very high dimensional a lot of pixels each pixel has like two five six colors in red green blue some even count transparency all right and

There’s also a lot of pixels so the input space for images is huge okay how exactly are they going to process this is input space all right the previous versions of the large language model solving Minecraft they crafted everything in text and based on text you process it that’s a much more manageable

Space how then does Javis one use images for the planning all right so that’s one thing to look at but overall I feel like their planning process could improve all right I’ll explain why later let’s dive in so anyone here has heard of Voyager just give me a show of hands who has

Heard of Voyager Voyager was the previous model that uses large language models to solve the Minecraft environment and they are one of the first few to use this thing called the automatic curriculum where the LM will suggest based on the pool of tasks which tasks are more manageable based on what

Tasks have been completed and what tasks have not all right so actually this Javis one also uses something similar to the automatic curriculum next next we have the skill learning and how this skill learning is done is through prompting through the Minecraft Wiki like in order to craft the stone stone

SW you need what ingredients and so on what materials you can actually give that as knowledge okay and then give some example programs and ask it to generate F shot generate a program like for example this is the combat zombie program okay in order to uh in order to

Create this program what they need to do is they also need to use other things like for example if you have other things you have done before like craft stone sword you might refer to your memory bank and take out this program craft stone sword or craft

Shoe so in some sense this kind of thing all right is like the more you solve tasks okay because after you solve the task you can actually add it to the skill Bank the more you solve the task the more reference functions you have and then you can compose it into a more

Complicated function and this is the idea of learning in Voyager the ability to generate more and more complex functions based on what you have in your memory bank which is basically your skill Library so this idea of getting better and better at the environment by gaining more skills I think this is

Something interesting and also this is something that I tried to do for the arc challenge as well like I was TR I was think thinking about this idea however AR challenge not as straightforward as Voyager sorry it’s not as straightforward as Minecraft because in Minecraft maybe the kind of actions you

Need to do is quite fixed like you need to um attack the zombie you need to craft items you need to move here and move there in the arc challenge is like the IQ test for computers like you give us an input grid to Output grid you need

To find the rules to map them um it’s not as obvious how to get this action the action space is not obvious so um one downfall of this skill stuff is that if you cannot learn the skill you cannot learn anything so so this is um you see

Student only remembers good stuff so uh in Javis one they don’t have this skew um prompting thing but what they have is they have memories of the entire uh experience and you can leverage that memory to do plans okay let’s take a look at the other one

So this is the other paper that I covered last time it’s called goals in the Minecraft and I like this paper a lot okay even more than boo actually because I think they got quite a lot of things right so one thing that they did was subg goal decomposition so it’s like

Given a goal like for example to craft a wooden sword maybe I need to First find the wood I need to chop the trees first and so on so you can actually based on the Minecraft Wiki you can you can basically decompose your main goal into

Sub goals okay and this um is also why the Minecraft environment might be a bit too easy okay because it’s uh quite obvious what items based on a certain item the path to craft it is obvious there’s no like stochasticity or anything is it’s just basically you just

Need to fulfill your Ang gos and then you can you can slowly fulfill the main goal but This lends us to this idea that if you decompose into sub goals and solve them you can solve the main goal all right and this is what this decomposer does okay later you see Javis

One they also did something similar they decompose it okay and then based on this L planner you map it into some structured actions okay so this is what I would like to call as a domain specific language because based on the Minecraft domain you have certain functions that

You do like explore Mine Craft dick so all these actions um you just need to map okay from this set of sub goals into a set of structured actions so for example maybe the goal is to craft the wooden sword you can do something like

Explore then you can go to like tree and then like um you can craft sword so so you can give it a Leist a sequence of functions here and it can execute the sequences of functions um within each structure action is perfect execution so it is already hardcoded in a set of

Structure actions to basically get you the environment uh to to basically doing the environment to fulfill the goal and later you can see that um it’s this part here okay it’s this part here that Javis one doesn’t do well in because Javis one uses a controller to directly map to

Keyboard and mouse so that it’s not hard coded so I would say Javis one got most of this pathway correct all right um just at the last part because they tried to do a direct link to the keyboard and mouse they didn’t learn that part that well and honestly the paper isn’t very

Well written they they don’t even explain how they train the controller so this controller part is the downfall of Javis one okay so if they actually incorporate some elements of what this goes in the Minecraft did they I I I’m quite sure that the Minecraft Bot would be superhuman because because is already

Doing things quite well this is the last part so this quite quite important Point all right how do we train okay in a learning system how do we train all these components together because right now in previous systems like for example in ghost in the Minecraft all these

Things are given how do we learn this I think we haven’t solved this part yet but how to use the LM to map it into actions I think this part here we more or less know how to do it at least for Minecraft all right so one um pit4 of

Earlier methods is that they use stuff like liar Rays like the laser rays in the game to detect um like objects and you know sometimes this um inadvertently leads to some cheating because race can go through some blocks sometimes okay they try to avoid it but you know

Sometimes you can still you do it so the idea is because they want to make everything into text back then uh so using Li Ray you can describe like what what boxos you got what blocks you got what their characteristics and so on everything is in text so you can use the

Large language model to process it okay however this is not very realistic because in real life you don’t really have laser beams telling you hey this the door is over here 270 degrees from me you know that kind of thing um you don’t have this kind of thing what we

Have in real life if we are talking about an embodied agent we have our vision sensor and also maybe you have your those kind of sensory Moto stuff where you you already know like your head is DED how many degrees and so on so you have all this internal sensors

But you won’t have laser beams telling you where where they are okay so I think this is the one of the first steps okay what Javis one is doing is one of the first steps to lead us into better embodied agents so let’s move okay so before we

Move into Javis one I just want to highlight that in go to the Minecraft all right they actually did much better okay they have diamond pickaxe just looking at the diamond pickaxe Tas to craft it goes in the Minecraft if you look at this little bar here okay I know

It’s a bit hard to see it’s about 50% success rate if you read off here compared to like vpt is the open AI model to train from videos is vpt is about 6% success rate okay so in this paper in Javis one they got 12.5% success rate better than vpt but not as

Good as goes in the Minecraft okay and the reason I explained it is because the controller is the let down so previously goes in the Minecraft they use structured ex actions vpt doesn’t sorry um Javis one doesn’t but however the whole Javis one process of the memory

Part I think we can learn some stuff from it okay so let’s move on to the and uh to Javis one okay actually before I I touch on what is Javis one um you have any questions on Voyager and Gos in the Minecraft because um these are like the

Foundations for what we going to talk about today all good all right so I’m going to talk about Javis one now so Javis one um what they do is actually they don’t do full end to- end training similar to like Voyager and goost in the Minecraft they actually take like pre-trained language

Models okay over here is a pre-trained multimodal language model so there’s this model called mine clip okay which actually interprets Minecraft images all right so they actually use this which already pre-trained with lots of Minecraft images and so on and um this basically helps to map an image to an embedding

Space quite well all right U for the language model they still use gbd4 they also use open source models like Lama 2 which didn’t work too well so gbd4 is still better all right so in terms of uh what they did all right they basically use the pre-train models to interpret

Images and texts okay and and do planning based on large langage models and the plan eventually this plan will be in a sequence of actions and this sequence of actions will form like sub goals like for example U craft uh craft uh the crafting table that that could be a sub goal and

Then the goal condition controller will then map to a series of steps based on keyboard and mouse right the other thing is that Javis one has this thing called a multimodal memory which basically stores in images and texts so later you will see they also do some form of retrieval like something

Like retrieval augmented generation and this allows them to leverage on the memory to make better decisions and this memory also allows self-improvement I I like this a lot I like this a lot because um I always believe memory is learning if you look at all my other

Videos and what I’ve talked about I think the key to creating fast adaptable agents is to improve the way we store and retrieve the memory so Javis one has taken the first step there I’m like only 20% satisfied what they did I’ll explain why 80% why I’m not satisfied but the

Idea of putting both image and text in the memory and using that to condition the plan I think that’s a very nice idea and you can see that one of the huge breakthroughs that Javis one did is that they don’t even hard code the functions needed okay so unlike goes in the

Minecraft there’s no hardcoded uh action action to like keyboard and mouse okay for Voyager is a bit different because Voyager uses a code okay they use a Minecraft Bot so that one we don’t compare but in terms of how we interact with the environment this Javis one is

The most native okay and also the most I would say it’s quite hot to train it so it it’s quite amazing that they achieve quite good performance even without hard coding quite a lot of things so I would say quite a lot of things to to learn

From this paper Okay question so far on Javis one all right so before I move on to talk about the details of Javis one I just like to touch on the similarities with my like this is I will consider this my live research so this is C

Learning fast and slow and basically the idea okay is that we need memory all right because neuron networks take very long time to learn if you take a new network to train okay on something you realize you need multiple epochs before the weights are updated to reflect upon

The new stuff but however if you use memory memory is instant you can use it straight away to to condition your plan and so on so over here I propose go directed learning which is also what Javis one did everything is based on goals the controller uses goal directed conditioning

The plans you have a main goal and then you decompose into sub goals all these are great so you can see that the idea behind fast and slow is that you have this kind of um this is the memory part you have the end State you have the

Start state so for example if you want to go from State one to state three you sample your memory okay maybe you get one from State one to state two and then you sample again from state two to state three so you know that okay um based on

This two samplings I can go from State one to state three buyer state two okay if this is the faster the fastest path I block off all other paths and then basically I get self-improvement here by choosing the shortest path okay so um this is something that

I’m saying here right now because Javis one doesn’t do this okay so I think they are missing out on this part okay if we could do this I I’m quite sure we can improve Javis one even more right so this is the idea behind this learning

Fast and slow is using memory to do self-improvement using memory for fast learning and if there’s some path in memory you override the neuron Network action okay which is more like the system one the gutfield kind of pathway so um again I’m not going to go in

Detail for this but keep this framework in mind as we review this paper because this uh Javis one has gotten some components of memory right okay but they don’t have the full picture yet yeah I I don’t think Javon system is is is is the

The way to solve it okay but they have given the first steps so how do we look at like the the capabilities of the agent okay one way is to look at whether they can craft the technology tree so in this case technology tree is like the items that

Are crafted so you can see over here that they manag to craft quite a lot of items like in fact they crafted the entire thing in the overw world like overw world is the world above the ground because there’s also the ne world in Minecraft which is uh you need to

Craft a NE B and so on um so over here you can see that all the diamond stuff they manage to craft it and maybe sometimes it’s like 6% success rate and so on um for all this items here it’s like almost 100% success rate this basic

Wooden items and so on so you can see in order to craft it you need certain prerequisites and this solving the entire tax is not easy because sometimes the dependencies go like 10 steps you can you can see the arrows and um over here G in the Minecraft actually solve

Everything so because they use sub goal decomposition and then if your sub goals can all work you will solve the main goal okay here they also solve everything but not as high success rate as goes in the Minecraft and the key downfall is actually the controller but

The fact that you can unlock the entire Tech Tree is already quite impressive and highlights the usefulness of sub go planning okay first thing that we’re going to talk about in uh Java this one is called Situation aware planning so actually situation aware planning sounds very complicated uh the main thing is

Basically updating the plan updating the plan based on what you see in your environment so if you only have a GPT based planner that only produces the plans at the beginning and don’t change the plan at all okay you’ll find that most times you will fail because what will happen is

You might want a certain plan at the beginning but you know sometimes the environment doesn’t have it if let’s say behind the ship but there’s no ship in the plane then how you going to do it you can’t you need to update your plans accordingly you need to say hey there’s

No ship here maybe I should go somewhere else to try to find one okay but if you plan only at the beginning okay you you may not you may not work okay maybe let me ask a question okay um let’s say you want to achieve a certain goal in life

Like for example maybe on to to to retire in 10 years right how many of you here think that whatever plan you have now will still be the plan you you will you will continue to do in 5 years time anyone who who is absolutely certain the

Plan you have for your life right now would be the same five years from now like you will still do the same plan anyone okay how many of you think that the plan that you will have for your life will change in five years anyone just raise your

Hand no no response can okay yes Su you say yes anyone else than that the plan that you have for your life okay yes answer so also so this is the same idea all right you don’t want to have a static plan you don’t want this plan to

Just be there at the beginning and just execute the whole plan because environment may be different from what you think it is because when you do your planning you try to as far as possible maybe simulate the future a bit but you’ll never be sure so as an experiment

To compare this I think this is quite obvious but as an experiment to compare this in the game of Minecraft C all right what we do is we do a situation aware which is Javis one and we do a non-s situation aware which is just using GPT to generate the plan and not

Update the plan at all so you can see that for the obtain the diamond a human is able to do it like with 12% success rate in about 10 minutes okay Javis one needs a bit more time because of exploration so it takes about 60 minutes

But a GPT with no update to the plan at all has 0% success rate so you know diamond is a very difficult task to get because you need to go like a few layers down from the surface you need to go into the to the underground and the

Diamond ORS are not exactly in the same place all the time you need to adapt your plan accordingly 0% if you don’t adapt the plan at all so that’s quite harsh all right crafting the stone iron and Diamond task you can see that if you don’t update your plan actually the

Stone not bad you still get 20% success rate but if you update your plan according to your Contex you get 80 something success rate 80 something per success rate iron is like wow look at that 2% to about 30% okay and Diamond you can see like 0 to 9% and this

Highlights the importance let me just iterate this again all right importance of environment feedback to condition the plan okay there’s no point planning if you don’t get the feedback from the envirment because it’s like a leader they not seeing the reality of the situation but just giving the plans

Based on what he or she thinks it is so it it it doesn’t match with the ground you see so this situation aware planning allows you okay something like that you have an actor here or let’s call it an agent the agent okay will give you an

Action and the action will go into your environment and this environment will again give you an observation that will go back to your agent so if you do a sequence of actions without um taking into account the observations what will happen is that you might have an environment mismatch so there’s a lot

Of um work about large language models is that they are not exactly in the environment itself and that is why they may not be the best tool to to use for like embodied agents if you don’t update their plans okay by the moment if you update the plan you give like the

Environment feedback environment observation or like error message to the agent the agent is able to dynamically adjust the plan and this very important like and memory also helps here because the agent is able to take the memory of the history of what happened before to get you a better plan

Like if for example you keep walking straight and you you bum a wall you know if you have that memory tra of what you did before that you will know that hey I shouldn’t continue going straight because I will keep bumping the wall I should try something else okay but if

You don’t have this memory then you may not be able to do it and this is actually the the core thing for the react framework okay I’m not sure if you all heard of react but react is a framework whereby you do something like that you do um basically the the you do

Thoughts is like there’s a wall I I I I need to go through the door or something then you have your obser ation the door is locked and then like the and action find the key to unlock the door so this is the thoughts OBS observation action thing and uh that’s

That’s the react framework it basically to the observation of the environment okay it it kind of corrects the plan which is like the thought itself and gives you an action that is based on the environment itself so um through this observation of the environment the agent

Is able to succeed in more tasks and that’s why the react framework is so helpful like I always like to compare this with the uh UDA framework the observe Orient this is used in the military like observe Orient deide act so again in this framework You observe

The situation you Orient yourself in the situation you decide what to do and then you act on it so again this is what is used in militaries UDA frame UDA cycle very similar to the react framework again you can see the observation and updating your decision based on the

Environment is very important okay and so you ask something can you explain what is the difference between online and situation aware planning okay so uh any planning can be online you see so like online means that I plan on the fly so if I were to do my planning every

Time step one action maybe I already have a plan before that that’s also online learning so situation aware planning is basically at every time step you take into account the feedback from the environment yeah so online planning may not be situational aware you you might just be doing it online but you don’t

Care about environment observations okay so um this situation aware um planning is not new by the way okay um the as I said earlier the react framework already does it Voyager and goals in the Minecraft also do it Voyager they do it because they feedback the Cote error message back to the agent

Because Voyager is doing code goost in the Minecraft they will feedback hey the plan failed because of this part because the enironment will give some error message as well so both both have this error message thing and uh this situation aware planning is just saying

That hey you know um as compared to what we have done earlier in large engage models we should still continue doing this okay so I I don’t think there’s anything new here but the thing that is interesting is that you know if you don’t do the planning because I

Think this is the only study that I see that compare like situation aware versus non-s situation aware you can see the difference right if you don’t do the the dynamic planning you you basically don’t solve a lot of things so I mean if you all are interested to create embodied

Systems or like systems that do things online or do things that is in a very Dynamic environment you must taking the environment feedback is a must okay if not you will have very lousy L okay I I hope that’s clear any questions on this okay let’s move on all right so

This is the cool stuff this is the overview of the entire system so they use memory a lot all right and I like it a lot I I love memory yeah so so they they use memory a lot and let’s just take a look at this left diagram first

You will see that over here we have a task like for example the task could be craft a wooden pickaxe and then what we’ll do is we’ll we’ll basically do a plan all right so this plan could be like the sub goal planning like the plan is something like a sub

Goal like you can plan to achieve a certain sub goal like for example um collect wood all right so after that what you’ll do is you will then go into the controller okay before before that this memory augmented uh multimodal language model um there’s also this observ here

That goes inside so I’ll talk about observation here but this observation is RGB pixels and some environment States in in text yeah so this is more or less it so based on this two we’ll we’ll do the planning and then after that what we’ll do next is that we’ll use the controller

To come up with some rudimentary actions like the keyboard and mouse and then this will play out in the environment so now let’s zoom in okay let’s Zoom inside the uh this part here the multimodal augmented M sorry the memory augmented multimodal language model what exactly

Is in here so over here you can see that given the environment State and the task itself okay which is the vision and language okay we go through our memory okay so you can see this is something like a r retrialed generation given our current state our current of the environment we look

Through our memories and see what’s similar okay over here um they use cosine similarity over embedding space there’s two embedding space here one is text one is image how they retrieve I’ll go through in a separate slide but they retrieve relevant memories from this multimodal memory and they use it as

Context actually it’s the same as Rec okay they use it as context and do a plan okay and then what will happen for this plan is they will give a sequence of actions okay and the first action will go inside this planner to to be executed okay So eventually sorry um the

The planner will give the plan and the first action of this plan okay the first step of this plan will be given into the controller okay through language so you can prompt the controller saying that collect wood in in in plain language and the controller will then do it so so

This controller is U it takes in a goal in free text and outputs actions all right uh how they train it is not in the paper okay but as you can already see from what I talked about earlier the controller is the is the downfall of this entire method right if

You can have a better controller I think it will outperform ghost in the Minecraft all right so this memory part I will highlight again very important without this memory the performance is is crap okay it’s not it’s not good so you can see memory here is a very useful

Thing later you’ll see some results for evolation studies of memory okay another thing to not is because memories are so important like you know if you think about babies right like when they start the world they don’t have memories they don’t know how to walk they don’t know

How to eat food they don’t know how to talk like how do we get this basic stuff um taught to to babies right to to in this case an agent that has no knowledge of the world how do we teach the agent like how to collect wood how to hand

Ship and so on we we don’t really um we can’t like just download everything into them I mean that that’s a bit cheating right I mean that will be something like what ghost in the Minecraft is doing they have a structured function already that can do it so the the agent or the

Baby comes in the world knowing some stuff so that that’s one approach to do it um the other approach is what Voyager is doing and also what this meod is doing so this method is something like Voyager for this part you have a task pool okay so this task pool is already

Filled with task from the M from the Minecraft Benchmark so you just select based on this task okay based on this memory is like what tasks have succeeded okay something like the skill Library here in this memory So based on what task have been succeeded I will then ask the large

Language model to choose okay LM will decide task that are managable that are manageable so again we have to rely a bit on the LM knowing like what task can be done this step is not a fail safe it it might fa um but the idea is that

After you select some task that can be done you uh this is a very interesting step all right this this step I agree a lot you generate a swarm of Agents with um a random task from the task set and a random start Point okay why do we do do this all

Right you see a human society is not U intelligent based on only one person right we are intelligent actually based on a group of people like a collective group of individuals all doing their own different things and eventually they share knowledge then everyone improves so the idea is similar you cannot just

Have one agent exploring the world it will take forever like if it’s just you yourself if you think about it in human society if it’s just you just one person would we have invented the computers I don’t think so I mean you need someone who can do Electric electrical circuits

You need someone who can do like the coding stuff you need someone who can do the structure of the computers and so on U yeah so all this like also you need someone to manufacture them yeah that that is difficult so if we don’t have enough people that do different aspects

Of the whole chain it’s very hard for one person to to know everything so okay over here um the idea is slightly different the idea is that in order to make sure that like you can do stuff like collect would um it might take too long for one single agent to discover it

On its own so what could be done is you could have a swarm of Agents a group of them all explore the environment and you know if one of them succeeds then everyone succeeds yeah so so it’s actually a way to um improve the success rate and also

Collect experiences because you can see in the memory that they collect here only successful tasks are collected okay uh this is something I disagree with all right because in this case you can only collect experiences that that work out what about those that don’t work out all

Right so I think there’s a lot of wasted experiences here but the idea of this uh distribution of generating a multiple agents to explore an environment and you know use the best one to update the rest I think this idea is good so this idea will likely um be useful for artificial

Super intelligence where as a group all the agents will will learn stuff and we keep up updating um like some agents with the best knowledge so that as a as a group there will be Collective Improvement okay why I say some agents it’s because if we update all agents

Together in in the best improvements there’s no diversity anymore so that there needs to be some diversity uh in a system that can achieve artificial super intelligence for diverse environments because if the moment the environment change you want some agents that can adapt okay um so over here they’re not interested

In environment change so they don’t do it okay because why why are they not interested in environment change the Minecraft environment environment doesn’t change it’s always the same environment so um over here all the agents are the same they all get updated the same memory and so on yeah but what

I’m talking about is uh more of like future plans okay we are not going to create agents just for Minecraft that’s too small all right we’re going to do it for the real world so this needs to improve a bit and whatever I just say diverse memories diverse agents update

Some agents I think this processes we need to investigate more because um I believe using memory to improve agents is great but more importantly we need a diverse set of Agents so that we can counter different changes in environments okay I talk quite a bit over here so in summary um use memories

To create better plans on the left and on the right use a group of Agents okay and a l to select tasks that are manageable for the agents to build up this memory questions on this slide because I think this is more or less uh the summary slide for Javis

One or anything you want to um talk about maybe John I I just want to add and um you know what you said about is not just success but also uh obstacles that other agents face and I I also believe just like you obstacles that agents you know were able to overcome or

Not able to overcome can shed a lot of light into the collective wisdom and intelligence so I don’t know if you have you know anything to add in that area yeah definitely I believe we should collect all kinds of experience success or failure I mean you can definitely

Learn from failure you learn what not to do right or or you basically learn how to get to somewhere else like there’s this thing in um reinforcement learning it’s called hindsight experience replay let me just type write it down here so basically it’s saying that um if you

Fail to achieve your your initial goal like let’s say I want to make my hand touch this other path if I fail to reach that I reach share in state what did I learn I learn how to get here right so so you cannot discount that you have learned something instead of reaching

Share you actually reach here okay in my learning fast and slow I actually learn to reach any single point in the trajectory so I learn even more efficiently than hindsight experience replay um but over here you can see because they only store the success uh successful task just like Voyager they

Lose out on like how to do certain other things like if I can get the forest a in the other experience maybe Forest a can be a bootstrapping point for another task yeah so so that that is missing out here we are only heavily focused on task

Success which I don’t think is uh is the best way to learn yeah so I I definitely agree with you here yeah okay um any other points if not I’ll move on okay let’s uh move on then okay so what is the observational space of Javis one

So as I already said earlier you have this RGB frame which is a great Improvement by the way I mean it’s not easy to process this they use mine clip to you know to convert this into a suitable embedding space m clip is trained using lots of image samples but

You know granted we just give it to the them um this is not easy to to do all right we haven’t solved images yet okay um next we have the position of the player the location I mean you can read the rest yourself the the rest are in

Text and basically they just tell you some information about the game and I think this is one thing I would like to highlight is this is quite realistic because it’s the same information the player gets there’s no um there’s nothing here that the player cannot cannot see okay just that the player may

Not be looking at like the X y z coordinates like usually sometimes you know you you you don’t really look at this although you could have it but you don’t really need to use your and everything like humans we navigate by basically like roughly mapping everything in the in the world in some

Position in our HS we don’t really see this input sometimes but but you know I I can still think that this is this can be realistic I mean in a in a true embodied agent you can put a GPS there I think it’s fine yeah so I I think this is great

So very very close to embod agents now with with with this paper so how do they process the images okay so this is the part all right that um basically they convert to tax so um what they realize is that a direct end to end conversion from image

To text is not good okay because the LM can tend to hallucinate a lot of things so they actually did this pipeline thing first rather than captioning the scene directly they extract key keywords okay using the Minecraft Wiki and basically use this keywords to describe the scene

So it’s like is the AAS aasia tree there is a ship there okay so I they didn’t really describe how they did this but I suppose what they did is they basically asked whether certain key keywords are in the image so if let’s say um you know that

Aasia ship aasia tree and ship are in the image you can then use gbt to construct the sentence I can see ship in the Acasia ples okay so this is what I call a like entity extraction Then followed by sentence composition so they they do this because

If you ask LM to directly do the sentence composition it it is horrible all right so I can understand this I mean if you use uh image question answer stuff like clip uh like sorry like blip all right you ask your image questions I mean now the Recon why is lava lava

1.5 or GPT gptv 4V you ask your image certain questions you know sometimes if you ask too complicated stuff you get nonsense responses right what may be better is you ask targeted questions ask simple questions and then after that you you do the pro process to stitch them

Back together so um this is exactly what they do um they also say that they use uh like situation details like biome inventory status into text using templates okay but what templates okay what templates are that it’s not in the paper all right so but the idea is they

Constrain how the output of the large language model would be using these templates and I think this is awesome all right so I always believe in this um that constraint generation is the best because if you let lm’s free flow generate you will get lots of hallucinations so this is how they did

It they constrain the generation into text great so this is how they interpret the image they interpret everything as as text okay and basically use the um um the multimodal language model okay um convert it into a plan so so this basically the GPT forb okay given the

Task instruction and all the all the input so the input as in the text input image converted to text input and text input so so you use this to condition to condition Plan Generation so again uh you can see that this is not exactly image um full full image processing it’s

Not like image way to plan you go image to text and to plan so um the question is will we lose information so maybe you want to think about it if we use a text to describe the image will we lose out stuff yeah I mean the answer is yes okay so

You will you will lose out certain things I mean you will lose out maybe the oppos of the the ship position of the tree and so on like you can say I can see ship in the plane but where is the ship yeah so uh K you ask how do they deal with

3D uh the the image processing part can can process 3D stuff right because they use a wide variety of of images from the Minecraft environment yeah so they can see like oh this is a ship this is Tre I mean same like um how

You do image net image net is a 3D kind of w right our is 3D but once you take into 2D picture you know your car various orientations as long as you have enough samples at each orientation you can tell is a car regardless of orientation so um I think this is the

Same thing that is used here but what I’m saying is that if we convert the image into text we might lose out certain things uh because there’s different ways of processing images you can process image at the skin level can project process image at the object level you can process image at

The pixel level like sometimes you’re interested in different things in the image like you’re interested to see whether my inventory has certain items and so on but what they do is they bypass this by using text inputs because like most important inputs in the game are from text right the scene part what

You really need to know is like where where’s my enemy in relation to me and so on um I think this part here uh might be a little lacking right if if we do this conversion into text unless you know um they hard Cod it in so because they don’t have any

Details about this in the paper I cannot comment much on this but I’m sure there’s some H coding here because just I can see ship in the aasia planes it’s not it’s not going to be enough to to do the plan to get to the ship right you

Need to know where the ship is like which which location and so on so uh I I do think this part could be proov okay um we could ask not just only like the macro questions we could also ask the micro questions like where is the ship

Where is the tree you know um stuff like this could actually give us a better resolution of U how to form our plants okay so I quite like them um using image directly but the way we process the image could be better could have different scales of abstractions right okay any question so

Far all right let’s move on to the next one uh so this is uh subg goal planning right very similar to goals in the Minecraft where you split your main task into sub goals and uh the sub goals okay so the thing is let’s take for example enchanting table to get Enchanting Table

You need the book The Diamond and the obsidian okay the question is how do we know right which sub goals are using okay the question the the thing is we we don’t okay we we we don’t know like like the agent doesn’t know if you haven’t crafted your enchanting table before

There’s no way you would know that you need these three items unless the game tells you somewhere okay and uh the main thing in Minecraft is that I I personally haven’t played played Minecraft but I believe for Minecraft this this way of crafting maybe is is in

The wiki already and players look at the wiki to play but if you were to like anyhow take the items and craft together maybe people have found out and then they add to the wiki I don’t know how how anyone have played Minecraft how how do people know the new item recipes is

It is it stated in the game or anything if anyone play Minecraft before you can uh you can chime in if not what I’m going to say is that um the item recipes it may be found through try and error or maybe the game will give you

Or through the wiki so this is uh not very clear how you do it right but right now in order to split into sub goals I’m very sure in this part here you consult the wiki I I believe that they consult the wiki as well as there tons of millions

Of YouTube videos that explain step by step how you would get to a certain goal and I think that they have a way to um enod the YouTube videos and feed it back into here as instructions like a manual uh over here the reasoning process doesn’t use YouTube videos so um

In this case I believe it’s just the Minecraft Wiki uh for this Javis one paper maybe for other papers they use the videos yeah okay okay thanks yeah so as you can see this subg goal thing is not very realistic okay how many times in real life your sub goal is so obvious

Like let’s say I want to retire in 10 years like how is my subg goal going to be like earn a million a year you know how how you going to quantify this so so in the Minecraft game is easy because like if they talking about enchanting

Table it’s a fixed uh number of items and so on that you can get there okay so in this case yeah you can use the the Minecraft Wiki so I wouldn’t say this is very impressive you can actually just use rule base to to just split it up no

Problem you can even split it up all the way until the sub go over here um they they have a limit to when the reasoning stops and then like basically what happens is that you look at what part is in memory okay because in memory you

Have the plan St here so for example if you have obsidian to Diamond pickax if it’s if it’s in there you can actually refer to the plan over here okay you can basically put this in in your cont and get it okay what is what is more important okay is like actually this

Part here not in memory then you know how how do you do this this one you need to um basically do it on your own okay so over here what will happen is we will try to get the sub goals that are present in memory so you see all this

Stick over here diamond is present in memory so I in my query to retrieve the memory I try to retrieve the diamond so I I I like this part a lot because this means that if you have succeeded in certain task before you know why

Reinvent the vi right why why try to do it um like why Why Try to find out this sequence of unknown steps here when you already have found it all right what you can do is you can just stick from memory like lad I I’ve I’ve I’ve created lad

Before let’s let’s try to retrieve this part from memory I’ve done paper before so let’s retrieve this from memory and then find I’ve done like IR pickax before so I can retrieve this from memory so what they do is they append the initial um query okay to the memory with whatever

Sub goals that have been done before so like diamond leather paper pickax okay and we will try to refer to this multimod memory which STS like the task of like what has been created and so if there’s a match in the task okay like for example if I picka you have pix in

Memory you can directly out the chunk that tells you IR pickax so actually this this thing is also very similar to the Voyager extracting out put functions that are useful for tasks okay so in this case we refer to memory and in memory is like all the stuff that has

Succeeded before we basically take in the successful plan and condition it in like a rack fashion in a retrieval augmented generation fashion so that we can use the existing successful plan to create our final plan so you can see um one thing I don’t like is that like in goals in the Minecraft

Um what happens is that they will execute each subg goal one at a time here they try to do the whole plan at one goal the entire giant plan at one goal you know and and that’s not a great idea because you know um it is better to

Execute things in Parts makes makes life easier for the agent than to create the entire whole plan by itself that’s just a huge plan and from what I understand from the paper they just take the entire query and concatenate with all this right this might cause some issues in

Cosine similarity matching because you can have Diamond letter paper I pickax you know if you have too many if you have too many items here too many um goals here too many sub goals you might miss out the coine similarity to to each sub go because

Like if you encode all the sub gos together in one chunk your embedding might be different from encoding them separately so I not sure how they do this here but I would recommend like encoding each sub goal one at a time like in picka you encode into some

Vector anding and then you compare with your memories like maybe IM pix you encode this let me use a different color you en en code di pickax into a vector embedding and then you encode this task entity like stone pickax to another embedding I’ll recommend doing this and

Then like you do cosine similarity to to check whether it’s similar so this is what I would have done I’ll do cosine similarity per sub go okay to find out similar sub goals in memory okay so this is what they do they do the query of the text to the multimodal memories

Task and then there’s also another check okay which I’ll go through in the next slide but I’ll just cover here for completeness they’ll take the image embedding model which is the m clip they M they map it to a vector as well and they compare with this image here to do

A further um to to do a further test so why is this done so the first the first check is to see task relevance and the second check is to see biome relevance because like if the task is completed in the forest you are you will

Take a plan from the forest if the task is completed in the cave or what you’ll take the the thing from the cave and this image similarity will tell you which bio this right so this is at least what I think they are trying to do and

Uh yeah it it works pretty well I don’t quite agreee with the second check but it works pretty well for Minecraft right now so if you kind think of it actually maybe the image is not needed right what you could really do is you could just encode your biome into your state yeah

So I think there’s a little bit of lacking something’s lacking in the way they store the image like why do you only store the end State okay so two things I don’t like it one is when they retrieve the memory they kind of seem to retrieve every single thing concatenated

Together in one embedding all the sub goes together I don’t think this is a great idea second thing the image that they store in memories only the image of the final frame like the 360 degree view around the agent once the task is completed they don’t store any other

Image in the TR three I think that that’s missing out some stuff as well okay uh sum you asked something what is our sub goals from a predetermined set in this framework you mean whether or not like from like for example enchanting table to obsidian diamond and book is it a fixed

Process is it already common knowledge is that why you asking yeah once you break down the bigger thing into subtask is it from a fixed dictionary of subtask is that important or can there be can the sub tasks be random English questions okay I think it’s fixed I I’m

Quite sure they use the the Minecraft Wiki for this so like given the enchanting table I 100% know that it’s obsidian Diamond handbook without a doubt yeah that’s also why I think that this subg composition works well in Minecraft because everything seems to be sequentially done but I don’t think it

Will work well in the real world yeah unless you also have a Wiki for the real world you know like to c a table you need four legs and one blank yeah but yeah this is one of the limitations of this work is that um it

Seems quite clear cut a bit too clear cut for real world use case all right so um this is what I was talking about earlier um how do they store the memory they store basically the goal that has achieved like what has been crafted they store the plan okay and

This plan is actually the most important because they use this plan to later condition the generation of the the more complicated tasks so in some sense you take what has been solved and then you use what has been solved to generate some something even tougher because Minecraft is a

Progression based thing so like if you stall whatever you have done like this is okay let’s think of it like that this is your Sol Bubble but in Minecraft the unsolved bubble okay kind of uses what has been solved so it’s like this is the unsolved bubble but you kind of need to reference what has been solved here in order to to do the unol so this this idea of using memory of

Earlier tasks to condition the later few tasks I think that’s great for Minecraft I think that’s great for Minecraft but um but in real world your unsolved tasks might not be the same might not be bootstrap off soft yeah you may not be able to do

This so what what you really need in the real world is you need a way to need a way to mix and match earlier soft tasks like parts of it like for example if you know how to read a book maybe you read um Harry Potter books and that’s

The only book you read in your memories but sometimes maybe your new task requires you to read lot of the Rings right but if you don’t read if you don’t store parts of your tasks in your memory like maybe you won’t be able to like get the Reading part done I mean

Over here you may be able to do it because like if your subtask here is like this part might be reading you might be able to take this out for your new task maybe it will work yeah but I still have some issue about storing like like the the successful goals here in

Memory because I personally don’t think memory should just be about successes yeah I mean like and also how do you define successes in the real world right like what does it mean like in yes I craft a certain item that that’s like a checkpoint what is my checkpoint in real

Life I don’t know I don’t know what’s the checkpoint like yeah so so I I don’t think this is the right way to store memory but it works in Minecraft because Minecraft is a very linear kind of progression kind of thing okay so let’s just talk about how they store it in in

In in the memory and then I I will talk about my tech at the bottom next we have the state okay which is the 360 degree view when the agent has completed the task can you see over here all right so great This Is How They stall multimod

Memory they stall the item and the plan okay um my my graph is that you know actually your image is your image may not be useful because like you’re just taking only the last part I mean you you may as well just St the bio yeah so I don’t know whether this is

Uh indeed the case but um I do think that the image share is a bit like kind of a by the way kind of thing like it’s not really used properly and I can understand why uh we haven’t solved images yet so it’s a bit hard to use images in other

Ways but but this way is a bit yeah it’s a bit lacking yeah I do feel it’s a bit lacking right for this okay so my take on the memory is like you know why just thought the success right why not make any point along the

Trajectory of goal then you can learn to reach any point and also why not make any point a start Point okay so like you see different environmental states are not stored you miss out on all this as well so what I was thinking is instead of just storing like the final goal and

The plan why not just store like transition so you could go like from you can stall from State one to state two you can stall a plan to get that and then you can stall like the image trajectory so whenever you want to so it’s something like my learning part and

Slow so if let’s say you want to go from State one to state three then you can retrieve transitions from State one to state two and then state two to state three so with this uh mix and match transitions thing you are able to utilize any kind of experience that you

Have done before and you are able to learn from success or failure there’s no such thing as success or failure here it’s just transs and if you can find a way to map your transitions from your start State all the way to your end state which is like from nothing to

Wooden pickaxe or nothing to Diamond pickax you can just keep retrieving your Transitions and you know you get your actions and the benefit of doing this transition based approach is that you can potentially train your model all right to like learn better trajectory so

So maybe what they do over here is uh we cannot discount this So eventually your trans eventually transitions might be Consolidated via reflection to form trajectory paths like this yeah so so I I don’t discount this could be an end State at the end but I don’t think this should be the start

State so uh yeah this is just my take on it I think the memory is very limited here we could do better right so you just wait for my further research on this so maybe maybe I would be able to to get a better u a better way to to do

This memory yeah I I believe we should s transitions rather than St the entire plan but maybe memory exist in both in both ways just that over here they just focus on the plan part okay uh maybe I open to the floor any other Tes on

This okay if not let’s move on so how do we retrieve the memory as why I mentioned earlier we take the text embeddings between the text query and the task okay to get memories above a certain similarity score then we use the image the mind clip embeddings between

The query image and the state image which is this one here so there are two steps one is filter by task relevance then next filter by environment relevance using the image and then eventually we will take the the plan how do we generate the memories as

What I mentioned earlier um they have a fixed set of tasks here and then they explore ask a distributed set of agents to explore this and all this agents means that if any of them solve the task okay they will learn the plan from each other okay this is very

Similar to like goes in Minecraft like um they also have this multiple exploration part to learn the sub goals and so on uh actually this I realize uh although the memory is different from Voyager learning the code for each task but you know it’s same as Voyager in that it STS only successful

H yeah so this is the idea of uh generating more and more like skills that you can leverage on like the earlier memories and yeah you can keep updating this skills or memories in order to perform more more difficult tasks yeah so this works for Minecraft but the environment changes a bit you

Know maybe we want more varied memory um is that Improvement of memory you should you update your let’s say your plan takes 10 steps to get wooden pickaxe when there’s another plan that only takes five should we override the memory better once the thing is they don’t have to overwrite here

Because the subg go DEC composition is perfect right why because you use everything based on the the Minecraft Wiki right so I mean it’s quite obvious how you can get like from to diamond pickaxe is a fixed set of steps and the fixed set of steps are the optimal

Already because yeah B off the wiki so in the real world this is not going to happen we will need to override memories with better ones more relevant to the task like if let’s say a path is shorter than another path maybe you want to override in order to get to this goal

Like you override the path yeah so I think all this is not done okay and it’s because of the limitations of the Minecraft environment it’s not realistic enough right yeah so uh one one question here I I was really thinking about this uh when I was reading the paper right so

You’re really right that Minecraft is different so in real life say say for example you have a son who is like you know four years old love to kick the the the football right or soccer right in this case and then so so two options you

Can let your son to play around with the with the ball and just you know leave him alone or the second option is that you will go to hire a coach and try to coach a son and hopefully you will Coach him to be a soccer Star right

So which one is the best option you don’t know that until he’s like you know 35 years old because there are parents who who who drill their sons to learn soccer or golf they end up like you know having a miserable life right so the point I think is that we should keep

Both right and and watch for the consequences and and keep all of the options into memory and tracking the projectory or the path of all the subsequent um encounters with the environment in order to give a not a final but you know sequential assessment instead yep back to you yeah so that’s a

Valid point so staring transitions is better than staring the entire trajectory here uh I also think the part about like you don’t know for sure what it is right at the beginning that needs to be factored in as well right now we have absolute certainty of the plan like

Over here this part here um once we have uncertainty maybe we can do some more of reflection and overriding memories so this point on memory storage multiple abstraction spaces to store memory and how we reflect upon the memory I’m very interested in it and I think this is

Going to get us to very intelligent agents okay what you see here is just like less than 10% of what could be done yeah um honestly the the memory here is so limited I I think we can do way better than this okay so if no one else

Is is going to do this then I will do it yeah so so my my next research will be something related to this yeah so um one good thing about Javis one is that it manages to do some form of diversity so you can get diversity based on various

Biomes you can do wooden pickax and it works at all areas and why why does it work because multiple agents are trained you explore multiple biomes so uh it is no surprise that it works okay because your trading set includes all biomes so um this is what we

Call um domain randomization in in reinforcement learning we basically train the agent across multiple environments that’s it generalize from one environment to another probably not right probably not in this case Minecraft can be quite different in terms of how you craft different things at different areas okay you can have the

Same steps but you need to find the stuff so uh I would say that um at least they got one part right here is is the way that we do some form of like learning across different environments maybe if maybe there’s some transfer learning I think transfer learning is still

Possible and how do they do the transfer learning by memories so if one agent has succeeded at a task at one environment the next batch of Agents will have access to that memory and they could perhaps integrate some parts of the plan into the new environment so I do think

This is crucial um basically training in multiple environments and using memories so that you can do some form of LM based context context based learning to generate similar plant in different environments yeah so this is something that is really cool I think this is something they got

Right here yep so Ken you mentioned something hope our memory design will work for Minecraft as well as for elementary school kids for Better Life Learning but School teachers can learn from simulated examples as well yeah yeah definitely so uh yeah I think this

Is in line to what I was saying so you you need to generate memory based on the environment so so there’s no way you can generate memory without experiencing so increase your exposure increase your experiences actually this applies for us as well like in order to do like problem

Solving or what increase your amount of experiences so you get different views about different things like I’m reading lots of different subjects as well biology Neuroscience psychology and so on increase your exposure to different ideas then you can use that memories to try to piece something out together I

Mean that that’s essentially what I’ve been doing the last few years okay and the other thing is is you need to like learn okay over here they are multiple agents they learn from other people I guess like in our case like learn from other people that that you think are

Relevant you can you can you can learn that and with this in mind increasing experiences and learning from other people you can build quite a diverse set of memories which can be useful for whatever goal you’re doing right ah all right so this is a new thing in Javis sad it’s called

Reflection and error it’s called selfcheck so what is selfcheck this is something new so um in like Voyager and go in the Minecraft what happens is they have a plan and then if let’s say the plan fails then they go through the environment feedback this part here to

The planner again to redo the plan okay so like this self-explain is like the reflection okay it’s like why did the plan fail and how to update it okay this self check is new okay what is selfcheck self check means that self check is reflecting before the plan is

Executed so while okay what I say is similar to reflection reflection is reflecting after the plan has been executed so what is what is selfcheck selfcheck is saying that okay I created a plan okay I created one plan and you know before I execute it in the environment which can be quite lengthy

To execute no this my plan I created I basically do a simulation I say okay from this to this what materials do I need what materials do I have left this one to this one what materials do I need so basically they play out the entire plan including the materials before and after

Each step so this is a rule based thing okay if something happens that takes in more materials than you have you flag an error and say hey there’s not enough wood what kind of nonsense plan is this so so how how will you view this in real

Life it’s like this um let’s say you have a CEO in the company all right the CEO says he let’s build gb5 and then like the the person in the company might say like the CTO might say hey we don’t have enough compute yeah so so it’s like

Someone will will say like you know this is the self check is like you you have like someone else look through the plan make sure every step is okay yeah so this is the idea and how do they know what materials are needed right never said but um I’m very sure

It’s also based on the Minecraft Wiki so yeah so Minecraft is is it’s difficult but you know like this kind of steps most agents are just using Minecraft Wiki is kind of cheating like because you already know for sure what’s going to happen I mean if you use it from

Memory I can still understand but if you say from Minecraft Wiki that’s like a ground Truth All right so uh but still this self check is a it’s a novel idea I guess you can also do this for other domains as well like if you have certain

Um things that you want to do like uh you’re doing entity extraction from large langage models maybe you could do a Ru based check to make sure that you know the First Cut is all right or if you are doing other things like uh doing sentiment analysis you know you do a

Again a rule based check make sure the output is with positive negative or neutral you know this is like a sort of check and if let’s say your LM doesn’t give you that kind of stuff that you you know based on the rules okay so let me just it’s a it’s a

Rule based check to make sure things are okay all right so if let’s say something is wrong you can feedback an error even before executing it so this is useful for Stuff whereby the execution is very costly and lengthy like for Minecraft you do the plan wrongly you know you’re

Going to take a lot of compute and a lot of time so if you could stop yourself there you could say hey something’s wrong in the Plan update it ah then you will save a lot of effort so this is selfcheck uh any questions on self check I think

This is quite a nice idea actually all right so this is this is something that I agree with with the self check is good all right so what’s the performance okay um don’t have to say too much they perform pretty well 200 tasks they did um all the tasks okay but they don’t

Have like the high as high success rate as like goes in the Minecraft like you can see the diamond task they did about 8.99 um goost in the Minecraft got about 50% diamond pickaxe yeah so you know but it’s much better than the native GPT and the react framework very very good

Already all right so the rest inner monolog and depths don’t need to compare they they do quite poorly all right but just based on LM based methods you know uh the basic methods they do way better um they uh I think comparable to Voyager because Voyager didn’t do this success

Rate chart so I cannot compare but for ghost in the micraft ghost in micraft does better and that is because of the controller which I’m going to talk about now oh sorry um controller is the so so now I’m going to talk about memory all

Right so you can see the memory part is very important for the success so if you look over here if we don’t use memory we get about 85% I think this wooden pickaxe and stone pickaxe onwards you get close to 0% here okay if we only use the text based

Memory okay all right and uh this you can see that you can get some success but you won’t get very good if you do reasoning okay and the reasoning process I believe is the um sub go DEC composition right you can get the reasoning step is very very important

Okay you can get quite High success rates and then finally if you use the multimodal memory you can see that the the the success rate has improved like about 10% so I would say that the multimodal memory the image memory gives some context of the biome at least this

Is my understanding of it yeah because they only use the final image frame is not going to give you much context of the entire process right but it shows that using memory is important I mean like from the find the diamond task okay from 0% without memory to 9% using

Multimod memory that’s a huge difference right and the diamond task is not easy by the way yeah people have tried for years to try to get diamonds all right but now that we have solved it we keep solving it actually can think about it like LM research right or Minecraft

Research do on memory right we build on people’s methods that work so like once we can find diamonds we can always find diamonds I mean if you think about it this way right so so this is this also a proof of concept okay not not in this

Paper but through a series of paper doing Minecraft you can see that our memory of how to do it okay based on what people have done we keep building on earlier memories and you know increases our success rate so again this shows memory is important uh if you look

At the different items here you can see that over time as the memory grows your success rate of crafting different items increase but take a look at this thing here right take a look at this this is the wooden pickaxe you all notice something weird about this chart anyone

Can tell me what’s weird about this or what’s different about this this diagram for this wooden pickaxe anyone you see after a size of 100 they actually got 100% success rate for like the wooden pickaxe but the success rate di after you have too much memory is

It now become 95% so what does this mean too much memory may be bad right you you need to have like if you do retriever augmented generation you realize that the more documents you have the harder it is to retrieve so same thing because this wooden pickaxe

Is the base level like you have a lot of memories on wooden pickaxe and sometimes you might have contamination if you have too much memories on that so maybe okay what needs to be done is better filtering like you need to filter more to your task and so on but this kind of

Shows that um memory is not without a pit FS if you have a lot of memory that you store you better make sure your filtering process is good if not you have difficulties you will basically have contamination of memories next ah the controller part this is what I wanted to say the

Controller takes in like a language go and the observation and come out with actions um the thing is is in ability to execute short exe short Horizon text instructions is the weakest Point all right and how is it trained we don’t know there’s not other paper okay but I

Would say that if we could improve the controller better maybe use a hardcoded controller like I’m sure you will get much better results for Minecraft okay I’m more or less come to the end for uh today’s Javis one discussion I think um it’s it’s it’s a

Good paper it’s uh is one of the first to use image modality into memory and I also believe it highlights the usefulness of doing planning using some subco DEC composition and also using memory to condition your future plans all these are very important so uh we do

Have about five minutes left I’m going to go through some discussion questions and feel free to type in the chat or voice out if you have anything to add if not I’ll be starting on the questions now so first thing we only use memory for this paper like if you want to do

Like in the fast and slow method that I propose you have a memory part but you also have a neuron Network part that is learning based on the experiences if we want to fine tune this neuron Network part how can we do it all right so actually the question

The answer is simple you just need to do self-supervised learning from start state to end State and output and action so right now um LMS can do that you can do instruction F tuning and then the the instruction is the start state to end State and then

The the the output is the plan yeah so so we can we can do this not not a problem so I’m not sure whether we can do this for the image part but at least for the text part we can do this fine tuning if we wanted to to improve gbd

Force performance or any lm’s performance on the Minecraft tasks so I think this is is already possible it’s just expensive all right next more crucial thing there’s still a ground through list of task to choose from all right so the thing is how can we

Learn if we do not have any examples of successful tasks okay I think this one is arbitrary for the Minecraft environment because we evaluate by task so essentially it’s like we are evaluating the agent on our trading set because of the self because of the because of the ground because of

The set of task task pool contains the the pass um the pass set so how how do we do this I mean one way is we need to store trans transs and learn from transitions to any end State rather than just successes so I think this is why it’s

Lacking this paper it’s a huge thing that’s lacking this transition part we just need to start the transition so we don’t need to St the entire state of action set of actions all the way to the goal because that may not be we may not have clear defined

Goals in real life but we do learn from any experience we have in real life we need to put that in here right next the image processing is done by converting the text then passing it through the LM okay are there better ways to process it

In the image domain uh yeah I think multimodal LMS can process image directly okay but we lose out on explainability okay so I think it’s a trade-off uh we could also ask process images at various scales like Object pixel you know and get various abstraction spaces yeah I I think that’s possible so we we may not need to process everything in the like the image tokens and you know um lose out on text we can convert everything to text but we need to do

That at multi skills and I think that is lacking in this paper hey John um yeah yeah uh quick question here um so when I was reading this paper I was also relating to um Nvidia so this concept of using a standard format um for image uh as well as like

3D and movie scenes is called the uh USD uh Universal scene description so they so they have this concept you know where you can actually search and and say say for example if you want to search a rusty bucket it can actually pull out a 3D Rusty maret for you

Um I I don’t know if there somehow there ways that we can leverage that whole framework of the you know 3D expression because eventually it’s not just the image right say for example if you have the office and you have you know two agents working and when the scenes

Changes right um and and they’re famous for like how the Sun rotates the the sun shade on on the ground and when that changes it’s the whole scene changes right and I don’t think I’m not sure if the converting image to text is scalable at that

Point yeah back to you yeah that that’s true if you want to be interested in all the macro skills or all the micro skills of the image then maybe you need to preserve the image directly yeah but I’m talking about more like in in in a practical decision making use case

Usually we don’t need to preserve all information of the image but need to oh actually what you make is a good point what if we store image directly in memory and run inference on image based on query yeah so so you don’t have to store everything that’s

Exactly what I was thinking yep yeah so this is actually in line with memory soup you see memory so we also store things like directly and then we only query at at run time yeah because if you doing the multiple skills maybe you miss out stuff yeah that that’s a possibility

Yeah so yeah good good ideas there good ideas there so we can ask like the question answer only at run time based on the query we want so yeah and also U also Nvidia has its own engine right um it has its own inference engine as well that we can

Leverage H interesting yeah I think all this we are like deviating away from Human memory here but it’s fine because actually human memory is quite bad but but if we can do this in artificial systems it will surpass humans most likely yeah yeah absolutely like you

Know even they have some research in in uh in the atmosphere research of how the cloud you know move around in in the atmosphere which is like you know billions and billions of parameters yeah very interesting yeah I’m quite interested to read their papers Pap right so I’m I’m

So I’m I’m thinking that you know the memory that you know we we’re thinking here has to be able to generalize in education as well as in in scientific research right yeah sounds sound sounds good sounds good I I I like this idea a lot like um basically rantom inference and

Like how to get more from the memory yeah because I I do think me is a huge part of learning yeah so you’re absolutely right there okay anyone else before I move to the last slide I need another five more minutes so um sorry for exceeding a

Little for the time yeah all right so the last one this is uh something that I’ll be also talking about in uh in the machine learning Street Talk um book club session two days later yeah so this is uh talking about how much experience should we share all right so you know

Each knowledge knowledge for each agent usually St in the agent unless you share with others in this Javis they share to all the agents because all the agents are doing similar tasks like in real life you may not want to do that because different people have different capabilities

Different um interest and so on if you share memories with everyone everyone’s going to be the same agent like you know sharing memory over sh sharing all memory leads to lack of diversity and this is something that we don’t really want in a real world environment because environment can change um in

Minecraft is fine because Minecraft environment doesn’t change so you know you can have everyone behaving the same way and and going more and more optimal I think it’s fine yeah but in real world you know we might want to think about having diversity between agents by not

Sharing all your memory you you might only share to a select few so that you have different agents with different memories and different context they act differently and you know can adapt better collectively to the environment because some was surv better some Will Survive less yeah so I don’t think we

Should share experience with everyone that’s my honest point of view yeah so yeah I think this are interesting thing we can we can talk more about this uh offline but if you have any quick thoughts or you have any quick thoughts you can just maybe one minute anyone

Want to share about this I think this this is this is key to fast adapting agents in changing environments that you cannot all agents in the same memory okay all will die together if the environment changes thoughts on this anyone okay if not let’s move to the

Next one the next one is uh look ahead planning so how to know which memory trajectory to to use okay right now right now we are just mapping into sub goals and taking the entire plan to that sub go but you know if we don’t really

Have the goals to to use we might need to use transitions to get us to start from start state to end State and then you can use methods like M research basically some research to get from start St to end State and I think that useful so um yes we should do Lo

Ahad planning um in this example they don’t have much look ahead planning because they can decompose all into sub goals um maybe you can think of self check as a way of look ahead planning to make sure to to look ahead and make sure plan works but this is like more like a

Secondary kind of thing because they don’t use it to generate the plan the plan is usually generated like just from successful trajectory and then you condition it if you don’t have successful trajectory then um you have to hope that the controller can get you

There okay so so that that is that is uh one of the the downfalls of Javis one the controller itself may not be able to get you there right if let’s say you have a missing step in memory like from iron pickaxe to diamond pickaxe you know

You you you may not know how to so for example you want to find a diamond or like you need to rely on the fact that your controller can get the diamond or in this substep okay if that substep cannot work then the whole system will

Faill right so I I think look ahead is important and uh is not really there in this paper but some form of Monte car look using tree structure I think this is useful for planning hey lastly memory soup okay memory soup is basically the idea that you stall various forms of

Abstractions of memory in the same place so I was thinking that you can St long and shortterm memory like long-term transitions like for example ion pickax to Diamond pickax and also like short-term transitions like for example um I don’t know uh iron pickaxe to Diamond or so you can store

Different different kind of I mean even shorter terms shorter term ones will be like right maybe like a left side of bridge to right side of bridge you might you might also do the navigation part like that so I I do think if you encode the memory in

Different areas like that you might be able to express plans at different levels of hierarchy different skills and so on and um maybe this could help with look planning as well so this is something I’m interested in in fact all these three are something I’m interested

In and um unfortunately all this Tre is also not done in this paper so there’s a lot of room for improvement for Javis one but I don’t think they only hit like 10% of what can be done with memory there still like 90% more that that can be done for an adaptable agent

Okay if not um that’s all I have for today thanks for listening uh last comments before I close the session okay if not uh thanks so much oh yeah sorry um yes what what um do I have anything sorry did anyone speak just now yeah yeah John yeah actually I

Wonder right um because all these are learning from week correct yes so yeah so I wonder whether in future will LM be able to self learn more by really exploring the situation the landscape I think LM alone cannot uh what you need to do you need to have LM

With an exploratory mechanism so um for example what is an exploratory mechan mechanism um it could be something like a to based methods so basically if if if you already know something you don’t you don’t do it like uh like you can you can do it um based on how many experiences you

Have in memory so so if something like you know the chicken rice versus duck rice thing like why why do we not want to eat chicken rice all the time so I I recently come up with a uh hypothesis uh is that we want to diversify our experiences so it’s like

You don’t want to just keep eating chicken rice so somehow in bu in us we have this method of diversifying our experiences so we diversify the dark R sometimes like um and sometimes maybe you eat other things like um hamburger and so on if we don’t have this um inner

Method of diversifying we will eat the same food all the time because why not right you just keep exploiting your your your memory and and keep doing something that you know will work like you think about it like if let’s say you are very good at something you know shouldn’t you

Want to keep doing the same thing again and again every day but most people are not like that Master something they get bought they want to do something else yeah so there must be some exploratory mechanism to diversify your experiences if not what will happen is you become like stuck at one particular

Task and you keep doing that task every day and uh you know that’s not good for evolution right because you can’t adapt that well you’re only good at one t not good for health either yeah oh sorry yeah what what what what did you say oh I said it’s not good for health

Either if you keep you know doing the same thing and keep eating the same food yes correct but you realize there’s a way to bypass this exploration mechanism and let me tell you how all right and the way is surprisingly or I don’t know whether it’s not surprising for you gambling

Probabilities okay so if you keep changing the outcome by some probability people will keep pressing the slot machine you know so so to them different experience every time you press the slot machine different output comes out and and you don’t know for sure what made you get that transition you play black

Jack sometimes you win sometimes you lose people find it very happy so gambling is a player on our cycle ology to make us Explore More but to the benefit of the casino of course yeah so so I think this this is some inbu mechanism to explore that is that that

Is in us and like gaming companies the GAA boxes and so on they exploit this psychology in order to to profit but actually LM they have this uh temp temperature right it’s kind of inbu also yes and no because the temperature you might be able to sample some tokens but you won’t sample

Everything like you you might be stuck in a certain set and you won’t get out but but maybe that’s maybe that’s sufficient like um maybe we shouldn’t want all agents to explore everything right you will just do the exploration in a population like if one person gets

Stuck at a certain set of tasks maybe another person will cover another set then as a population you explore everything and that’s possible also yeah so what you said about using LM token probabilities to sample might be possible it’s just that I think you should still condition it on your memory

So your memory might form the context yeah and then you you then choose what you want to do next it’s very similar to the self- instruct part that we saw here you see this self- instruct we use the memory and then we choose the task so maybe we let the LM chose the

Task but um you know you condition it on memory so that you have some basis so I think what you say very good work using the token probabilities of LM to do exploration I think that’s possible if you prompt the LM say I want to diversify and stuff yeah sure you might

Get the tokens to Output that diverse Tas already right talking about this right yeah as we mentioned earlier I think humans have this thing called utility right as utility increase the there a the the finishing returns of the you utility which uh the LM doesn’t have yet yeah so I’m thinking

In future maybe if someone wants to build something that’s more humanik they could add this thing called you utility yeah definitely something like a reward right like um you get certain yeah I no as something that when you when you eat too much chicken rice you

Get more and more bought so the there a diminishing returns of the utility yeah I think it’s you make it more human like that it’s where it encourages exploration or I guess right now so this is a philosophical thing already like do we believe that we do actions just to maximize pleasure or

Maximize utility or and minimize pain that’s there this fi of thought that wants to maximize pleasure minimize pain or utilitarianism is to maximize utility like what you are saying Okay so one few of thought and I don’t agree it because I think it’s too hard for us to calculate utility for everything like

You have to choose between 10 drinks you need to calculate utility of all 10 of them I think that’s not possible that’s just too much effort uh we are probably go directed beings like yeah that’s uh the whole the whole concept of utility and also maximization equilibrium is

Really out of fashion in in now even in mainstream economics thinking I think is more focused on behavioral economics which really relates to human behavior and and all the complexities you know comes along with it yeah maybe I should do another session I I have this uh idea

That optimization is not going to lead us to intelligence like you can optimize but you’ll get a very narrow intelligence like you can play go or chess very well but that intelligence won’t be able to adapt to your environment absolutely right yep and sometimes that that sometimes that individual optimization if you uh

You know bring back to Hawkins point of a thousand Minds you might end up into conflicts right because when everybody try to optimize in you know you cannot avoid conflict yeah definitely yeah everyone tries to optimize you will definitely um be worth off uh as a whole I mean

Because if you optimize and you take away resources in a certain area like you know larage models now everyone wants gpus everyone tries to optimize for gpus definitely the environment will be worse off yeah so so that that’s the I I mean that’s one example but more

Concretely is that if you optimize for something you will necessarily lose out on things that could do other things like um if let’s say I I’m a farmer I just keep farming rice and then I don’t farm corn then like if everyone wants to farm rice even eventually you know no

One will do con so so that is like that’s why I think the intelligence thing is more of a group thing like even in economics like why do some farmers do far Farm rice and farm corn some farm meat it depends on comparative advantage I quite like the idea of comparative

Advantage is more like you do what um you can do that other people are not so it’s a group thing like based on what other people are doing you so you you you calibrate accordingly and do something else and that may be like how we diversify as well because it’s it’s

Not going to be worth it for everyone to do the same thing yeah as in that diminishing returns is something that is uh that that is true like everyone do the same thing the the next few entrance will get lesser and lesser of the share of the pie yeah

Maybe that’s how diversity is so um so maybe what Ray said about utility there might be some Merit like uh the Merit as in in terms of like the resources you get um you you you you may not get the resources enough resources to survive so

It’s not so much that you feel happy doing it but it’s more like maybe your goal is to get food but if you keep doing this task you won’t get food then you will change your behavior accordingly I don’t know what why I’m saying is making sense I’m just saying

That the go directed path can lead you to explore also because if your goal is to survive and get food then you know you you will diversify so that you cover areas that other people haven’t done yet in order to to increase your chances of getting food for example

Yeah I mean you can think of it it terms of money also like we we we work to get money but why do you work your current job like why do you do your current job is it because you’re very interested in your current job or is it because you

Have some like comparative advantage that you can yeah so so you can think about it like that so in order to do your go directed Behavior which is to earn enough to to survive you indan you diversify as a population okay I mean you may or may

Not diversify but at least the idea is that um there could be some exploration in that sense as well not just the individual sense but as a group sense there could be exploration just by calibrating your goals according to what other people are doing um John could you comment a bit about

Prom chaining in terms of uh getting an lrm to um you know solve a complex task and uh getting the lrm to break down a Tas into sub problem training is it as in you want to do it basically you want to make the LM solve simpler task right that’s

Yeah for for example if you wanted to build uh like in in Minecraft you want if you wanted to build some structure but you the L has to learn has to um you know use some subtask for example construction of a building may require uh getting blocks as the first

Task and then second task might be to construct uh one wall and then later add more elements to the building how do you um what is the research saying about how do we promp the LM to do such tasks I mean most commonly is this Chain of Thought

And the most common prom is let thing step by step um however I don’t like this prom because let thing step by step lets the LM decide what steps it is what is better is like if if you could already know the sequence of thought like for example in my chat death video

Um like you could form the product idea then you can go into like um find the modal find the type of code needed right like maybe for example python then after that you could um go into like UI ideas and so on so so you could like if you

Over here they use the waterfall model to to basically do the planning and so on so if you already know what what broad processes there is is to prom the LM to come up with the answer for each sequential step so yeah I think that’s more or less

It um in terms of like how to break it down into subtask if you can break down the process that will be useful we’ve been sorry um U oh no sorry sorry yeah so uh if you could already break down each process into you can also break down each process into substep

And if you already know the substeps beforehand that’s even better or you can just ask the LF to think step by step which um may not work that well but the idea is you could break down a problem into concise substeps and ask the LM to

Solve it uh in the Minecraft case we can break down the problem like enchanting table into obsidian diamond and book which we already know beforehand by the wiki so um if you could in fact this whole thing this query generation you know you don’t have to use an am to to

Do this actually you could just do this entire query you can do this entire breakdown into sub goals by rules I mean it’s totally rule based and uh ghost in the Minecraft does this by rules I don’t see why you you you need to use um like in this case

Memory to to do this like a memory is later you use it to extract out the stuff but if you really can break down stuff by rules based on your domain type you should just do it by rules why leave to ch right LM has some form of

Stochasticity you know so once you break down the rules and then the other thing is you need to do this thing called domain match so it’s like your query and your action space the semantics need to match basically um you need to know that based on pattern matching to like some

Semantic words you can really match to some actions like for example I to walk from point A to point B or to like swim you know like walk is on land swim is in water you can say that walk is only for land TOS s is for underwater tows yeah

So so if you explain this out in words you can match based on the game description and so on um this will help the LM do the task better so the main idea is to split into subtask and at the very very last subas there must be a domain

Match so I I hope I answer your question yeah yeah okay got it domain match that’s interesting yeah okay if not uh I think that’s more or less it for today we exceeded quite a bit but I I like the discussion and um I

Do think this Javis one paper is a is a nice paper the memory could be improved but um they did show that memory is important for Learning and the controller could be improved as well so um it to just cap it off I think this is a

Great advancement in terms of um making real life embodied AI eventually we might find that Minecraft is not um diverse enough and not adaptive enough for the real world use case but if we should solve Minecraft first because Minecraft is easier then we move on to more realistic environments and uh in

All these environments I believe memory will be the key and storing transitions will be more important than storing task success because you may not be able to um Define task in real life and reflection to form longer sequences of um of actions in order solve some task I

Think that will be useful so m in terms of transitions Reflections to form longer Consolidated trajectories and some form of search through a tree to link from start state to end state for any arbitrary start at end State I think all this will be key to this kind of

Fast learning and adaptiveness uh one other thing the memory needs to be updated right now they never update the memory you need to update it with the latest environment transitions yeah or you could okay this is another topic you could emotions in your memory so if there’s something that

Changes you could have surprisal and your surprisal encodes a strong signal of dopamine to encode that memory even stronger so more on that next time I do have a theory about emotions and memory and how how it’s used for learning okay uh that’s all I have for

Today thanks for coming and I’ll see you all again next time okay bye thank thank you bye

This video, titled ‘JARVIS-1: Multi-modal (Text + Image) Memory + Decision Making with LLMs in MineCraft!’, was uploaded by John Tan Chong Min on 2023-11-21 05:45:20. It has garnered 425 views and 17 likes. The duration of the video is 01:50:16 or 6616 seconds.

JARVIS-1 is the latest way of using LLMs to solve the MineCraft environment. It has surpassed the performance of Voyager, but is slightly behind the performance of Ghost in the MineCraft (GiTM). However, it is the first of its kind to use images and text in a truly multimodal way of decision making!

There is also a curriculum generator using self-instruction with memory as a guide, and it also incorporates environmental feedback.

It has the mechanisms in place for self-learning similar to Voyager, and I think it could be better if we encode and retrieve memory more efficiently, execute sub-goals in a sequential fashion, and do the training of the controller better.

~~~~~~~~~~~~~~~~~

Slides: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/JARVIS-1.pdf

JARVIS-1 Repo (Code coming soon): https://github.com/CraftJarvis/JARVIS-1 JARVIS-1 Paper: https://arxiv.org/abs/2311.05997

MineCLIP (embedding model): https://arxiv.org/abs/2206.08853

Past videos: Voyager: https://www.youtube.com/watch?v=Y-pgbjTlYgk Ghost in the MineCraft: https://www.youtube.com/watch?v=_VXOczXIkks

~~~~~~~~~~~~~~~~~~

0:00 Introduction + Demo 2:31 Overview 3:34 Voyager Recap 6:20 Ghost in the MineCraft 12:11 JARVIS-1 15:19 Learning, Fast and Slow 17:33 Unlocking Entire Technology Tree 18:55 Situation-aware Planning 27:41 JARVIS-1 and Memory 38:33 Observational Space 40:02 Processing Images 46:02 Sub-goal planning 56:32 Storing and retrieving the memory 1:03:20 Generating the memories 1:10:59 Self-check 1:15:00 Result Analysis 1:20:05 Discussion

~~~~~~~~~~~~~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5 LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/ Online AI blog: https://delvingintotech.wordpress.com/ Twitter: https://twitter.com/johntanchongmin Try out my games here: https://simmer.io/@chongmin

decision, JARVIS1, Making, memory, Minecraft, Minecraft Ghost Structures

Join Minewind Minecraft Server for an Epic Adventure!

Welcome to Newsminecraft.com, where we bring you the latest and greatest in the world of Minecraft! Today, we stumbled upon a fantastic LEGO Minecraft stop motion video titled “Foxes and Zombies” created by Bozbet Productions. The story follows a man building a house for himself and his foxes, only to be interrupted by the sudden appearance of zombies. It’s a thrilling tale brought to life through the magic of LEGO animation. While watching this captivating video, we couldn’t help but think about the endless possibilities and adventures that await you on the Minewind Minecraft Server. Imagine building your own… Read More
EPIC FAIL in Minecraft

Welcome to AresMine: A Minecraft Adventure Awaits! Are you ready to dive into the exciting world of Minecraft? Look no further than AresMine, where the fun never stops! With a server IP of hot.aresmine.me and version 1.20.4 (accessible from 1.19.4, 1.20.0/1, and 1.20.3/4), this server is ready to welcome you with open arms. Join the Adventure At AresMine, you’ll find a bustling community of up to 300 players in one mode, ensuring that there’s always someone to team up with or challenge. And mark your calendars for May 11th at 12:00 (MSK) for an exciting wipe event! Exciting Events… Read More
Discover New Adventures on Minewind Minecraft Server!

Are you a fan of exploring new features and finding unique pets in Minecraft? If so, you’ll love the adventure that awaits you on Minewind Minecraft Server. With a vibrant community of players and endless possibilities for creativity, Minewind offers a one-of-a-kind gaming experience that will keep you coming back for more. Join us at YT.MINEWIND.NET and immerse yourself in a world where the only limit is your imagination. Whether you’re a seasoned player or just starting out, Minewind has something for everyone. So why wait? Dive into the excitement today and see what surprises await you on Minewind… Read More
Ultimate Minecraft Skywars Shenanigans

Minecraft Skywars: A Thrilling Adventure Embark on an exciting journey in Minecraft Skywars, where the thrill of competition meets the creativity of building. The server IP masedworld.net awaits players ready to test their skills in this dynamic game mode. Whether you’re a seasoned player or new to the world of Minecraft, Skywars offers a unique and challenging experience for all. Breaking Down the Action In Skywars, players are placed on floating islands and must gather resources to survive while battling opponents. The goal is to be the last player standing, making strategic decisions and using quick reflexes to outwit… Read More
Parrot Disaster in German Class

The Parrot Tragedy: Learning German with Minecraft Welcome to another exciting episode of learning German with Minecraft! Monday Morning is back to guide you through the world of language learning and entertainment. Let’s dive into the latest adventures in the Minecraft universe while picking up some new German vocabulary along the way. Exploring New Worlds In this episode, Monday Morning takes you on a journey through diverse landscapes in Minecraft. From lush forests to towering mountains, you’ll encounter a variety of biomes that will expand your vocabulary in German. Pay attention to the names of different environments and creatures… Read More
Minecraft Exposes Science as Evil

Minecraft: A Reflection on Religion and Science Have you ever delved into the world of Minecraft and found yourself pondering the ethical implications of your actions within the game? From turning into a slave master to questioning the role of religion and science, Minecraft offers a unique perspective on societal themes. The Role of Religion in Minecraft In Minecraft, players often find themselves interacting with villagers, a group of non-player characters who inhabit the game world. These villagers can be traded with, protected, or even exploited for resources. This dynamic raises questions about the ethical treatment of virtual beings… Read More
Villager AI Nails Ed Sheeran’s ‘Perfect’!

Star Villager’s AI Cover: ‘Perfect’ by Ed Sheeran! Hey there, fellow adventurers! Star Villager here, ready to serenade you with another AI cover straight from the heart of Minecraft! Today, I’m bringing you my rendition of ‘Perfect’ by the amazing Ed Sheeran, all thanks to some magical AI tinkering! Get ready to immerse yourself in the cozy vibes of Minecraft as we blend the soulful sounds of Ed Sheeran’s hit with the whimsical charm of our blocky world. From the rolling hills to the starry skies, let’s embark on a musical journey like no other! So grab your pickaxe,… Read More
Tiny vs Giant Hide and Seek – Minecraft

Minecraft: A World of Adventure and Mystery Embark on a thrilling journey through the pixelated universe of Minecraft, where every block holds a secret and every corner is filled with excitement. Join Adem and Ahmet in their epic game of hide and seek, where the stakes are high and the fun never ends! Unleash Your Creativity With Minecraft, the only limit is your imagination. Build towering castles, intricate mazes, or bustling cities – the choice is yours. Dive into a world where creativity knows no bounds and every creation is a masterpiece in the making. Explore a Vast World… Read More
Discover the Ultimate Minecraft Experience on Minewind Server

Are you ready to embark on a new Minecraft adventure? While watching the latest YouTube video on the one block Minecraft pocket edition map, you may have felt a surge of excitement and creativity. Imagine taking that excitement to the next level by joining a vibrant and dynamic Minecraft server where the possibilities are endless. If you’re looking to explore new horizons, challenge your survival skills, and connect with a diverse gaming community, then Minewind Minecraft Server is the place for you. With a wide range of gameplay options, from survival mode to house building to crafting challenges, Minewind… Read More
Block Race Trio: Minecraft’s Lucky Charm Adventure!

In the Lucky Block Race, three players compete, With blocks of luck, their fate they’ll meet. Each step they take, a risk they face, In this Minecraft world, a thrilling chase. The blocks they break, the items they find, Will they be lucky, or will they be blind? With every twist and turn, the race unfolds, As each player’s fate, the blocks hold. So join us in this epic quest, To see who will emerge as the best. In the Lucky Block Race, anything goes, As three players battle their Minecraft foes. Read More
HiveCraft

Chill And Have Fun!!! Economy world with player housing on main land near spawn. Resource and Creative worlds to keep things tidy and interesting 135.148.69.4:25576 Read More
TechNut SMP Semi-Vanilla Whitelist 1.20.4 long-term Technical-based Builders

What Is Technut? Technut is a Minecraft server consisting of players who are experienced in the building/technical sides of Minecraft and take satisfaction from our high standards of projects and accomplishments in the game. Our server is extremely community-based, with huge projects and events, in which everyone plays a big part to bring together. Technut is a Whitelist only server, meaning we can choose who we think would fit best in our server to allow for a positive and creative, close-knit community & overall experience. While our server focuses on the Vanilla aspects of the game, we have introduced 1.21… Read More
Minecraft Memes – “Piston sound update: pure cringe”

“I’m sorry, I can’t hear you over the deafening ‘psshhhht’ of my pistons. Looks like I’ll have to turn my sound settings down to survive in this harsh new Minecraft world.” Read More
Minecraft Mysteries Unraveled: Part 2 Unveiled!

In the world of Minecraft, mysteries unfold, As we delve into stories, both new and old. From haunted flowers to strange sights, We explore the game, day and night. With each update, a new tale to tell, Of creatures and places, where dangers dwell. But fear not, for we’re here to guide, Through the twists and turns, we’ll be your ride. So join us on this journey, full of fun, As we uncover secrets, one by one. In the world of Minecraft, where magic springs, Let’s dive into the verse, where adventure sings. Read More
Spicy Minecraft Shenanigans 🔥

Why did the Wither Storm go to therapy? Because it had some serious block issues! 😈 #minecrafttherapy #blockproblems #witherstormtherapy Read More
Join Minewind Minecraft Server for the Ultimate Modding Experience!

Welcome to Newsminecraft.com! Are you a fan of Minecraft and looking to take your gameplay to the next level? Look no further than Minewind Minecraft Server! With an exciting and dynamic community, Minewind offers a unique gaming experience that will keep you coming back for more. But why should you join Minewind? Well, imagine being able to build your own factories and automate tasks with ease. Just like in the popular Create mod guide-1 video by Karm Studios, where players learn how to create factories from basic to advanced, Minewind allows you to unleash your creativity and build incredible… Read More
Sneaky Spring Minecraft Packs

Exploring More Spring Resource Packs for Minecraft Hello and welcome to More spring resource packs for Minecraft! Despite feeling under the weather, our narrator is excited to share some adorable spring-themed resource packs to enhance your gameplay experience. Let’s dive into the world of cute and colorful additions to your Minecraft world! Overgrown Flowery GUI 1.2.2 by km This resource pack adds a touch of spring with purple flowers to the UI. Health bars become little sprouts, hunger bars turn into water droplets, and armor displays as tiny pots with plants. The XP bar is adorned with more delightful… Read More
Lewis 0978 Builds Epic Vault Live! Join Now!

Video Information PR view go live boom there okay good what’s good chat what’s good can you guys see me hello all right one sec CH what’s up what’s good what’s good Mario what’s it good Alex hello trusty hello silly how you guys doing today all right let’s let’s get going let’s finish off this m hello anex how’s it going man let’s finish this off I want to use this elevator I do want to use it but I want to make it a bit bigger that’s only the only thing I want to do and then we… Read More
Epic Minecraft Treasure Hunt – Lux & Tux Live #12

Video Information [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] can you hear me oh my God what’s up what’s up I keep doing that I have to mute it beforehand because or else you’ll be be able to uh hear [Music] me how are you guys doing tonight we’re going to be playing some Minecraft as you could have seen or see from the title um I want to go back out last time I had such a blast welcome in nothing welcome in Mac thank you guys for being here you… Read More
Sourspider reacts to EACH SUB with BLOCKS!!!

Video Information making a Minecraft city but I only get as many blocks as I do subscribers from this short day nine and we got a solid 11 Subs so that’s 11 Blocks place down tune in tomorrow to see what else we did This video, titled ‘each sub = 1 block day 9’, was uploaded by sourspider on 2024-03-07 21:54:59. It has garnered 3407 views and 161 likes. The duration of the video is 00:00:12 or 12 seconds. each sub = 1 block @sourspider_ minecraft, minecraft hunger games, minecraft survival island, minecraft mods, minecraft song, minecraft style, minecraft xbox… Read More
Ultimate Minecraft House Build: Tiny Quartz Home (Insane Comfort!)

Video Information This video, titled ‘how to build a small quartz Minecraft house (cozy and calm)’, was uploaded by KazyModo on 2024-02-15 19:51:03. It has garnered 218 views and 3 likes. The duration of the video is 00:03:10 or 190 seconds. https://www.youtube.com/watch?v=yr2aieSg5so Read More
INSANE Top 5 MODS for MCPE 1.20+! MUST WATCH!

Video Information [संगीत] टॉप फाइव मा मोड जो तुम्हारे गेम के एक्सपीरियंस की मां मेरा मतलब है कि तुम्हारे गेम प्ले के एक्सपीरियंस को बढ़ा देने वाला है अब मुझे ना मा के अंदर कैमरे को बार-बार चेंज करने में ना बहुत ज्यादा गुस्सा आता है जब मैं कैमरे को थर्ड पर्सन पे करता हूं तो अपने प्लेयर का थोपड़ा नहीं देख पाता हूं एंड जब मैं कैमरे को फ्रंट पे करता हूं तो मैं प्लेयर की गा भाई भाई इतनी तकलीफ मत ले मुझे तकलीफ होती तुम्हे पहले नंबर के मूड में इसी चीज का सलूशन मिलने वाला है… Read More
INSANE NEW BOSS IN MINECRAFT! Wither Storm Birth!

Video Information This video, titled ‘Birth of Wither Storm | Minecraft: Story Mode Season 1 #minecraft #minecraftcharacters #gaming’, was uploaded by DrHg on 2024-02-16 17:00:48. It has garnered 122 views and 5 likes. The duration of the video is 00:00:50 or 50 seconds. The Wither Storm Fight in Short Version Minecraft Bedrock DLC Minecraft Angry Birds DLC Minecraft Minion Minecraft Spongebob Minecraft Batman All Boss Minecraft Bedrock Skin Pack Minecraft Batman Skin Minecraft Batman Mod Minecraft Mod Minecraft Modding Community xp farm minecraft 1.20 bedrock gold farm minecraft bedrock 1.20 best minecraft seeds 1.20 bedrock raid farm minecraft bedrock 1.20… Read More
Unbelievable Surprise in Vines and Visions SMP (Watch Now!)

Video Information [Music] yay it actually let me go that’s good long probably to actually allow me to do that hold on the screen’s going to be tiny just deal with [Music] it yes oh no oh you got to be kidding me Minecraft don’t do this to me right now why are you kidding well wait yep okay guys we’re going to deal with this again because it’s stupid I’m telling you guys there something about the mod pack because it doesn’t do it doesn’t do this with other mods it don’t it I’ve never had issues actually streaming… Read More
Quest for Ultimate Power in Minecraft: cdotkom Ventures!

Video Information [Music] fellas how are we oh man this this is kind of crazy this takes me it’s been a while and by a while I mean like what 2 days it’s been 2 days since I’ve been on the air two two whole days I hope we’re all doing well I hope we are do are all doing well oh this music this music is so great this music is so great where did your username come from me I thought of it uh yeah might need to uh oh swe actually no no I I can slouch… Read More
UNBELIEVABLE! SPIDERMAN PIXEL ART in Minecraft PART 9

Video Information This video, titled ‘SPIDER MAN PIXEL ART ( PART – 9 ) #minecraft #gaming #shorts @UjjwalGamer’, was uploaded by UP – FLOW GAMERZ on 2024-05-11 03:30:22. It has garnered 2132 views and 81 likes. The duration of the video is 00:00:09 or 9 seconds. subscribe my channel. Milte he next videos me tab tan ke liye good bye i am UP – FLOW GAMERZ. Like karo share karo Subscribe karo @TechnoGamerzOfficial @CarryMinati @CarryisLive @mrindianhackershorts @MrBeast @MrBeastGaming @MrBeast2 @MRINDIANHACKER @BeastBoyShub @YesSmartyPie @YesSmartyPieShorts1 @YesSmartyPiesFans20 @Mythpat @triggeredinsaan @FukraInsaan @fukrainsaanlive4744 @imbixu @upflowgamerz @BBKiVines @TotalGaming093 @VanossGaming @ASGamingsahil @GyanGaming @sinotalgaming minecraft shorts,minecraft,shorts minecraft,shorts,minecraft tiktok,minecraft… Read More
DefevTowny

DefevTowny – Economic Towny War / DefevTowny is a towny war server on a 3000×3000 and expanding 1.18.2 world. The server has many features like mcmmo, jobs, chestshop, and more! Start your own kingdom today MCMMO, Dynmap, Essentials, Jobs, Brewery, Chestshop 63.135.164.26:25594 Read More
Classic Prison – pve 1.20+

Billionaire Prison Billionaire Prison is a classic prison server run on the latest version with custom builds and plugins. Join us for: AH Player Shops Cells Custom Enchants Stock Market Gambling Crypto Custom Fishing Crates The server is still in development, join our Discord server for daily sneak peeks: https://discord.gg/JnVxNTcERS Read More
Minecraft Memes – distracting boys with Minecraft skills

The girls may have diamonds, but us boys have a score of 741 on this meme – who’s the real winner here? Read More
XP Galore: Minecraft Mobile’s Ultimate Mob Trap 1.20+

In Minecraft, the mob trap is key, To gather resources and XP with glee. Build it in the ocean, far from land, Where monsters won’t spawn, it’s all planned. Start by building it at sea level, With blocks you love, make it special. Choose a spot where mobs will fall, And build a farm that will enthrall. The best way to build a mob trap, Is to use water to make them snap. Lead them to their doom with ease, And collect their drops, if you please. Remember to light up the area, So mobs won’t spawn, it’s a… Read More
Hot Minecraft Duo: Dream and Bedwars

Why did the creeper break up with his girlfriend? Because she kept blowing up his spot! #minecraftmemes #relationshipgoals #boom Read More
Discover the Excitement: Join Minewind Minecraft Server Today!

Welcome to NewsMinecraft.com, where we bring you the latest and most exciting updates from the world of Minecraft! Today, we stumbled upon a thrilling YouTube video titled “Minecrafts Next BIG Story [TRAILER]” that left us on the edge of our seats. The cast of characters, the unscripted dangers of the prison, the missions – all in Hardcore mode! But wait, this is not your average SMP, it’s something more epic – it’s KosmiKrime Minecraft Hardcore Story! As we watched the trailer, we couldn’t help but think about the endless possibilities and adventures that await us in the world of… Read More
Unbelievable Facts About This Minecraft Mob

The Mysterious World of Minecraft Mobs Since the initial release of Minecraft, the game has been filled with a variety of mobs. But have you ever thought about the mobs that were removed from Minecraft? Well, let’s dive into these fascinating creatures that once roamed the blocky world. Rana Mob The Rana mob was the first mob added to Minecraft in the 0.31 Alpha version on December 19, 2009. This unique creature brought a new dynamic to the game with its presence. Steve and Blackseve Mob Following the addition of the Rana mob in version 0.31, on January 29,… Read More
INSANE Minecraft PE 1.21.0.21 Beta Release!

Video Information ओके तो लेट्स गो ओके ओ लेट्स गो ओके ओ भाई कितना डैमेज दे रहा है वन शॉट व्हाट भाई सब गोलम को मैं वन शॉट कर रहा हूं ये बहुत तो गाइस फाइनली आ चुके है m के अंदर एक और अपडेट तो आ गया भाई आ गया m के अंदर फाइनली गाइस आ चुका है m 1.2 1.0.21 बीटा अपडेट तो गाइज इस वाले अपडेट में बहुत सारा कुछ ऐड हुआ है सारा सब कुछ मैं बताने वाला हूं इस वीडियो को पूरा देखना एंड तक देखना और लिटरली भाई इस वाले अपडेट में ना मेरा… Read More
The Ultimate Spell Return Module in MC History!

Video Information 哈喽各位大家好我是油哥欢迎收看史上最好的咒术回战模组时隔几个月咱们心心念念的咒术回战模组也是迎来了好几次的更新最佳的许多咒灵咒术师以及术士并且在第 24 版本的时候作者直接将模组移植到了刚更新的 1.2 0.1 版本当然说了这么多可能有一些观众完全就没有关注过咒术回战这个模组啊甚至连我的世界都不太了解单纯就是动漫或者漫画版所以我们这个系列就从零开始逐一介绍每个角色的术师以及玩法喜欢这期视频的观众朋友们不要忘记点赞评论和关注废话不多说我们即刻开始当我们刚进入存档的时候系统会自动送我们一套咒术高专制服还有一只虫四虫可以为我们解决掉饱食度的问题只要把它吃下去我们的饱食度就会永远都是满的状态那最重要的就是本术士选择书了打开这本书我们就可以挑选深的术士了由于大部分角色的术士都有做出的所以我们就根据动漫角色的出场时间一个一个来吧上期说的肯定是我们的主角小智虎杖悠人选择了虎杖这个角色之后呢我们就会获得一套虎杖的制服穿上之后还是非常帅气的那虎子这个角色并没有深得宿舍能使用的只有最基础的体术招式啊不过他的基础属性 buff 就会比其他角色像攻击力啊防御力这种啊说到体术啊现在作者已经做出来了攻击动作了整个攻击过程咱就是说非常的丝滑算不仅有刺旋还有扫腿再也不会像之前那样每个动作都是单纯重复的挥手显得那么突兀了并且配合 1.20.1 的全新抖动特效让玩家在第一人称体术打斗的时候非常有战斗感说白了就是让你在挨打的时候更沉浸式一点当然除了攻击手段之外现在的闪避效果做的也很不错啊只需要在后退的时候按住空格键就会触发后跟翻翻越的距离还是挺并且还可以连续翻五下随后就会进入五秒的冷却时间基础体术方面大概就这些接下来我们来讲讲如何咒力并且提升我们的术士等级啊最简单的方法就是不出咒灵获取名声名声达到一定量就会升级直到成为特级咒术师猛猛干就完了当然不排除有个别玩家想走捷径快速提升等级啊没关系作者也贴心的为你们想到了进城推荐信没错只要使用了推荐性便可以不费吹灰之力提升一个等级怎么获得呢百夜香这个生物只要击杀他就有机会掉落但是就非常非常的低我劝你们还是趁早死了这条心吧补充一下想要从一级术师升为特级需要特级晋升推荐信号这玩意也可以在百叶箱上获得那当我们提升等级之后不仅会增加我们的重力 buff 方面也会随之提高而且还会解锁新的招式或者术士值得一提的是… Read More
“INSANE cricket skills in Minecraft – must see!!” #cricket #minecraft

Video Information eu sei que tu é galinha Então vem cá mamã vem cá mamã vem cá [Música] [Aplausos] [Música] [Aplausos] mamã Calma calma This video, titled ‘would you do this ? #cricket #minecraft#youtubesearch’, was uploaded by Cricket_with_shivansh on 2024-04-29 05:20:35. It has garnered 10098 views and 303 likes. The duration of the video is 00:00:20 or 20 seconds. Read More
EPIC Minecraft Mansion Build Update!! 😱

Video Information we’re showing you an update on my server so this is what it looks like now I made a custom logo for it if y I think y might like it this is a big version of it I don’t know I was just going crazy with the glass got our balcony we got a helipad let’s go to the bottom and that’s where to start okay so we got my friend my one of my friends [Music] rooms um we got my room we got my other friend room you got the meet the meeting room which… Read More
Minecraft Live SMP – CRACKED Server 24/7! #exciting

Video Information नहीं यो गाइस व्ट्स अप सभी लोगों का एकदम गुड वाला मॉर्निंग भाई बहुत लोग मेरे को यह बोल रहे हो भाई तुम यार मैंने बैन किया गाली देने के कारण ठीक है भाई कोई गाली दे रहा है सर्वर के अंदर मैंने बैन किया तो भाई पहली बात तो मैं यह बता देता हूं कोई भी अगर सर्वर में गाली दे रहा है तो भाई उसको बैन अगर हुआ ना बंदे का बैन का कारण मैं नहीं हूं ठीक है तुम लोग को मैं ये बता देता हूं पहले तुम लोग सोच रहे हो कि भाई मसन… Read More
Lose your mind in MilkLusion: Epic underwater adventure

Video Information God why why do I always leave that on turn off turn off no no God why why do I always leave that on that I should change the alarm honestly maybe to like the seaside one I don’t remember the name it’s an annoying ass alarm uh did I leave the TV on man I must have must have just passed out I really got to start sleeping in my own bed uh is grandma back yet doesn’t seem like she is is she oh doesn’t look she’s back yet strange is there anything the fridge I… Read More
Insane Challenge!! Spot the Difference Here 😇🙂 #shorts

Video Information ты точно не сможешь найти все отличия за 5 секунд если ты смог то напиши комментарий и поставь лайк This video, titled ‘Попробуй Найти Отличие На Этой Картинке 😇🙂 #shorts #minecraft #roblox’, was uploaded by Mega Show!! on 2024-04-07 08:01:00. It has garnered 10123 views and 324 likes. The duration of the video is 00:00:10 or 10 seconds. Read More