Hire Talent

Get Hired

Developers

Feb 16, 2026

Monsey

Becoming a 10x Engineer with LLMs

A meetup for developers, by developers.

Developers

Feb 16, 2026

Monsey

Becoming a 10x Engineer with LLMs

A meetup for developers, by developers.

[ PRESENTATION ]

Moshe Kolodny

Video not loading?

Inside Look

[ OUR SPONSOR ]

Proudly Sponsored By

Devx Staffing

Support Their Work

About the company:

At Devx Staffing, we find and hire the current and future software superstars of the technology industry and match them with top-notch businesses who are ready to scale up.

Devx Staffing

About the company:

At Devx Staffing, we find and hire the current and future software superstars of the technology industry and match them with top-notch businesses who are ready to scale up.

See Their Work

Video Transcript

Video Transcript

I have been doing LLMs. The, my first LLM app was a tool to take swagger specs. That's a open, API not open ai, open API spec. And it, it, it describes a interface or, you know, with the different various calls. So the pet store, swagger Spec is the, the typical example that's given. You could see which pets are available. You could adopt a pet, you could change statuses of an adoption, things like that. So essentially what my app did is you gave it the, the, the Jason spec, and it was able to you could ask questions on it and do. I did this in 2023. This was GPT-4 eight K. If people remember that that particular model that was a long time ago in LLM years, this is punch card era. So this was, this is really going back. ### Methodology and Feedback Loops Okay, so this talk is just random stuff that happens to work for me. It changes based on the week and there is no one solution if anyone tells you that this is the, the way to use AI or LLMs, that I would, you know, be very skeptical of that. We're gonna look at some demos, what works, what doesn't work, and we're gonna see, you know, some things that actually I didn't realize until I went through and saw if it actually is helpful or not. And I'll describe the methodology as follows. So what I did was I said, I have this idea. I think this is going to be an improvement for. Usage for ai, for LLMs, and I had Claude run the experiment. So I would say you know, here's a, a project introduce a a TypeScript error and fix it. And then I would do it with the TypeScript compiler attached to it, then without the type attached to it and see how good it was at doing that. And I was able to basically grade every one of these different things to see what works, what doesn't work. Okay. So yeah, I'm gonna, some of these things I thought would be an obvious improvement and was not. So we'll talk about that. Okay. So the first idea is to build feedback loops. I think the last time I think Walter Schu gave a speech speak in the talk in Muncie. I think that was also on ai and I think I also mentioned that, and that was a, a big conversation piece. So what that means is the. Typical example I give is just running the types of compiler on the project. You'll make a change, run the types of compiler. If things broke because I changed a type definition or I expanded you know, how this variable could be used I want to, you know, that'll cascade through the whole project and Claude can't really see all that. But compilers are very good at deterministically figuring out if this project is in a good state or not. So that's what you wanna do. So there are two flavors of doing that. One is to, one is to basically say you know, run the types of compiler, you know, and assume it'll listen to you. It's pretty good about it nowadays. It used to be terrible at it. The other one is to use hooks, and this is a cloud specific thing, but there are like different flavors of this for, for other things. If you have, if you're using a agent that does not have hooks, you could throw a proxy in the middle and then kind of do the same thing. That is. Not recommended, but you know, it is an option. ### Automated Integration Demos Okay. And now we're gonna see what this looks like. So, okay. That does work. Hopefully the screen is big enough. So the demo is basically add a a type to an enum and see what happens. So it's gonna do that. It added a type and now it's done, and now it's gonna run the compiler. The compiler returned with an error because there's somewhere else that this is being used they didn't know about, and then it's able to fix it up and, and that's it. And that was able to, to, to get to that point. Okay? So that's one option. The other option is, people don't realize this, but Claude Code actually has a VS. Code integration. And if you use that, then you automatically get, the, the VS code itself, speaking to Claude, if there are like syntax errors or things like that in the code, in the editor itself. So it's more intelligent, it doesn't need to run. All those you know, linting and, and build and stuff like that. *doesn't quote natively. Have LSP.* Claude, I think that's a recent ish thing. Yeah. But that's doesn't give you linting, it doesn't give you other stuff as well. Like if there's if you've, if it like, doesn't have a closing brace, it'll realize that right away. Or it has to first ask for ask. Yeah. But this is automatic because it edits it in vs. Code and vs. Code is now gonna know it and then tell Claude if you actually watch it says, you know, vs. Code told me there's errors and like, it's it's like automatic, which is a nice integration. ### Giving the LLM "Eyes" Okay. That is the first step. The second tip, and this is also another flavor of giving a feedback loop, is to give it eyes. So you want to, when you write code, you write the code. Talking about UI development, you write the code, you open in a browser and you see how it looks. It doesn't look right. You iterate, you change the code, you look, you see how it looks and that back and forth. So instead of you doing all that back and forth and you know, some people like, they'll write some code in the, the Ask Club, do something. Get the answer and then try it, and then send a screenshot to Claude and then go back and forth and keep doing that. But you could just skip that and just tell Claude how to open a webpage. So this is MCP tools. Everyone's heard of that I hope. Playwright is probably the most popular one. There's actually a Chrome Dead Tools one as well. So this is not gonna be a demo because it's, I couldn't record like this type of thing, but I gave it a test of open a page. Look at how things you know, look, look at what I'm trying to do. See what's broken, fix it and keep going. This was the before picture, and this was the after picture. I didn't tell it what was wrong. I just said, I don't know what the problem is. Use a browser, figure it out, and fix it, and able to do that. I've been using this tremendously when it comes to. Complex layouts that have some div is not scrolling for some reason, and it's on a resizable pane and it's there's flex layout and I have mono editors with auto hiding. And it just, all that complexity, I don't have to try to describe it because I know Claude could just reach into the browser and see what the divs look like, so that makes things very easy. ### Tests vs. Documentation Okay, another thing tests go out of sync very, very fast. And LLMs writing tests, writing documentation make this problem even worse. So well, everyone says like, the first thing they do is like, oh, look how great LMS are. You know, look at my project and make documentation for everything. And it, it does a great job. And then they'll never ask it to write documentation again. So that documentation they had three months ago is still there. Way into the project's completely looking different. And so it never gets state in sync. On the other hand, if you have a test, tests are run in CI or locally or whenever, and those are forced to be in sync. So you want to spec out your project based on end-to-end test, just test whatever flavor it looks like rather, rather than documentation. Okay. So, yeah, I think I spoke about all this. Alright, so we're, Claude is really, really good about writing tests, especially end-to-end playwright tests. It knows playwright has been on the bleeding edge of LLMs. Playwright has a testing framework as well before they did LLM work. So they're really good about that. So let's see what that looks like. So over here. I'm having a just right test for a smart parer. This is something that I had we're assuming this existed and okay, we'll fast forward a little bit. Okay. It wrote a bunch of tests. It ran, it, ran the test, saw that it failed, it's gonna fix the issue. And these are very. LLM friendly tests like write a parser to take slugs from a URL is the most hello world of LLM tests. You can give it. There's no that it's such a well understood problem. If you have like a more complex task of, you know, some crazy business logic with some weird edge case, whoop some, somewhere at edge case, that's gonna be a, a much harder thing to do. But, you know, so you have to look with everything with the grain of salt over here. ### YOLO Mode and Autonomy Okay? Okay. The next tip is that you want to tell the LLM what to do and let it figure out how to do it. This is one of the important parts of you want to let get out of l m's way. So there's something called the YOLO flag or YOLO mode that is when you do dangers, skip permissions. It will yell at you if you do this with like root in the on the root profile. And technically you're not supposed to. But, you know, come on. So you know, just, just, you know, use common sense if you, you know, you have a V project, you're not like having it read your emails where people are trying to, you know, throw prompt injection in there if it's, you know, well understood. Area and you control all the data. You know, don't worry about it. It's fine. Especially the newer ones are, are pretty good about even if someone would try to inject some stuff in, they're supposed to be able to detect it. I've never tried it. But I, that I, that I wouldn't do, and I mostly do this on like remote workstations where the worst they could do is like run some crawl commands potentially as me, if it really knew what I was doing. But like, it can't like format my hard drive locally. Okay, so I ran two different experiments over here. One was having to sit and watch Claude do it's thing and to manually prove it. And one of 'em just on YOLO mode and the difference was pretty it's like a three x improvement in just speed, even if just clicking yes all the time. So this is what it looks like. I probably should have done those last two things outta or swapped around. I think this is right in the validators. Okay, well it does that. And then, yeah, so this is this one ran three times faster. I don't know if I have that. Probably, I probably got lost one in the edits. ### Agentic Workflows and Scaling Okay. Yeah. Okay. So the. So far, we've been looking at single instances of running Claude and having to do one thing once at a, you know, one at a time. Claude is also able to multitask and not only is it able to multitask it, able to fork itself essentially. And to spread out our workload among different worker threads. So recently they always had this thing agents, but they subtask, but now they have a new concept called agents. And this is behind a flag that you have to actually set in the environment variable in your settings file or something. But when you do that, you get a very, it's, it's alpha, but it's, it's, it's insanely powerful. So what that looks like is you. Give it a task and a and you, you could describe what the team looks like, but you don't have to. And if you don't right, this is what it looks like. Okay. So let's just read this out. Okay. So I wanna build out a thing with various different subsections and, and do things. So this is very much a placeholder. I've had I've done a similar thing where I've given it my UI application that. It doesn't have enough tests as it should. And I said, go to every single page on this. Split it into different work with reds. Go to every single clickable link, every button, every anything you see. Try it out, see what it does, and document it. And then I had a giant file, a spec file, and then I was able to create tests for that. And I had it run. I had it right end to end test for each module. It would, it also split that out and then I had it run those tests and make sure they pass. Then I created a PR with that, ran it in ci, saw what failed, told it, what failed. So that part was a little out of the loop because CI's a little tricky to get in there. And then I was able to fix them and have that working. And now I have a hundred and some odd tests of a fully. Working program in CI with video documentation essentially. So this is another way of having documentation in the form of tests. *So you codified all your bones* essentially. That was one of the concerns especially because one of the things I have is screenshot testing, and those are extremely flaky because I think the delta is three pixels. Yeah, it's pretty, pretty rough. But I also have it that in CI it will get the, the updated goldens and ask and have you run a command to be able to update it, which actually was a Google-esque thing *and you don't even look at it. Right.* No comment. Yeah. So over here, the way this is running let's see if that works. Well, these are all prerecorded, so it's gonna work, but, okay. Let's see what it does that it's going to. So this is essentially, it's gonna create it initially by itself and then it's going to create different workers to do different things. So once a documentation writer, worker one is a test runner worker, and the other is read me, write, read me a handler writer. Yeah, I'm not even sure what the difference is. But all these different things. And, and this could be, it could be fe you could split it by feature. You could split it up by task. I had, one thing I did was I created a a v plugin that allows platform specific UI development to happen locally. And I wanted to know how robust it was so I had it. Clone, I had to create five worker threads, take a random UI project, clone it, add the plugin, launch the project, go see if it was accessible in the browser without it having its own off, and then adding a screenshot of the running application to a list of screenshots. And that is a lot of steps for a person to do. But to be able to have an L-L-L-L-M be able to do that autonomously was extremely powerful. *I'm told how many agents do we use for and say like, too many agents could be* right. So at one point there'll be too many. So I created a a workstation and I could choose the sizing, and I just made it a monster workstation. And then the task was, I had it. Create five workers and that it was basically a worker pool and it basically had 50 tasks, let say. And it would take a task, run it, and when son, grab the next task and run it. So essentially it's a, it's a, it's a worker pool of, of five workers running down a list of tasks. And so it couldn't run more than five at once. Five I think was a probably decent place, but the Claude was actually the thing that was crashing. So go figure. Even though like these applications are monstrous and Claude should not have been the bottleneck, but that just goes to show that Claude code is, is really not optimized. It's kind of embarrassing. But you know, especially when you have long conversations, Claude is going to very much struggle. I probably should look into the new terminal. What's the new terminal there? People, everyone's using? No, no, no. The, the, the terminal emulator like I term. Like I turned to, there's a new one, ghostie ghost. There we go. Yeah, that one. So I think that's really, really fast and it probably will solve a lot of these problems. But I haven't tried it yet. Also because I use the, the the claw that's part of the terminal of VS code. So, you know, I'm not, I don't, I'm not switching so fast. *Is it balloon cost to use a sub engine like that? Because. People contacts every single time.* I'm sure it costs someone something. Doesn't cost me *4.6 with 1 million s.* Yeah, it's, it's expensive. Yeah, I'm sure. I'm sure it's a line item. 'em somewhere. So you paid for API for that or prescription? My company does. My company does. *How can I persuade?* Send me your resume. Fund a referral. *goes these for windows.* I think it's it's like Ming Yu type of thing. If you wanna use windows, probably it's like a, it's, it's meant for Linux, but I'm sure it works on Windows also. *So you described how many agents you wanted. Mm-hmm. And then it went ahead and orchestrated those agents. Yeah, of course. Mm-hmm.* ### Orchestration and The "Claude File" And I'm gonna explain exactly how I did that. So I basically had one cloud file. This cloud file said we are doing this task. This is, we are testing this plugin. We are going to have here's a list of 300 projects. Randomly pick one, like shuffle them, pick the first 50, and then we're going to split that into five worker threads. Each worker thread is now going to launch Claude again as a subprocess, and that is a level three Claude. And in the Claude file I said, here is what Level One Claude does. Here is what level two Claude does. Here's what level three Claude does. And the first thing I tell it to do is, you are level two with this project. Go read the Claude file and, and you know, go on. And that was it. So that was, I didn't have to like have the prompt Claude one, prompting Claude two with all this context. I just pointed out the file. It has everything it needs and just let it go. Okay, so I think I spoke about all that. And this is a great thing because as documentation updates, as you know, you see something's not working. You update the cloud file and everyone could read everyone else's instructions. Like level, you don't, level threes are allowed to know about level ones, you know, they. It's not their job, but it does give them helpful context to know, oh, this is like the overall task, so I'm not gonna focus on this dumb little bug I see along the way, because that's not what the goal here is. So it is very helpful in that regard also, and it kind of, it, it kind of starts having you know, just compounded returns on things like that. ### Creative Limits and Directing Results Okay. LLMs are really good at doing tasks. They are terrible at coming up with ideas. How to do those tasks. They will do it very enterprise, for lack of a better word. They'll, you know, keep throwing if statements at the problem until, until they solve it. They won't take a step back and say, hold on a second, I'm approaching this the wrong way. So you need to kind of step in sometimes and just be the adult in the room and say, you're doing this terribly. Here's another way to look at it. Or I feel like this is a little bit of a hack. Could we reevaluate it? And things like that. Just like reset. Its. You know, operating mode for that. Another thing, if you wanna do something that's mildly interesting, they're really bad at coming up with ideas like that. For example, if I, I'll give a, a random example. I had recently, I had there's a project that has a express router and I want to be able to use that in my project. But their express router is completely on typed, so I want to, you know, this is actually gonna, this exam's gonna go off the rail very fast, so I'm not gonna, I'm gonna skip it. Basically, I wanna use the type system in a really unique way to be able to do that. And I'm gonna use type assertions and box box classes to do that. And LMS will have no idea what that even means. But if I give it like a really small 10 line example and say, run with this. It's really good about doing that. Okay, so yeah. 10 lines is, is all it needs to understand like what you're trying to accomplish. If you try to explain it in English, it's probably not gonna work. Give a small proof of concept in code that works and that will get you very, very far. If you're writing, if you say write tests and you see it's just going for, just test and mocking out things and you don't want it to mock it out, you want it to be more end to end or you want, it's doing too little mocking out. You want more end to end. Give it an example of like a prototypical test that you want to do and have it go from there. It's really good about doing that. LLMs are not smart. They are very good workers. They are not good thinkers, not not creative at all. So that is something that's come from you for now. And that is what our main job is when we're working with all limbs as closely. Okay? A bunch of random stuff here. Plan mode is really good, but you should use plan. You should look at it and see that it makes sense. Don't do it because that's one of the things you do. And then switch and then have it do it. You know, there's a, there's a purpose for it. It is good for its internal monologue. It's, it's basically thinking mode on steroids, but you still wanna look at it. *4.6 started entering plan mode on its own. Sorry, what? Opus 4.6. They started entering plan mode on its own. It does, but not on YOLO mode. Oh yeah. I don't know anything about yo mode, yo. Yeah, it does not it could, sometimes* ### Advanced Configurations and the Type System *you would say use Opus four six in plot code by default or. Too much depend.* I use Claude 4.6 on yellow mode with high con, with high thinking on fast mode. *What is fast mode? I'm sorry, with a million context. Yeah, this is, this is very much worth it for my employer no matter how much it costs.* *Are you the reason why Netflix is gotten more expensive? Yes. Yeah.* Yeah. So another thing you wanna do is you want to use the, the, the tools that are forcing your code into. A well-defined structure that is auditable. So generally that means the type system, right? That is something if you have un typed things and you're just like using just vanilla JavaScript without TypeScript, or you just like have no type enforcement in your TypeScript, there is things could morph. A little bit too freely, and you don't really need that flexibility for LLM written code. LLM is very good at if it needs to we, we tend to do it because we're lazy. Like I have a string, I want a number, I'm just gonna convert it to a number LLMs, and we'll do it uns unsafe. LLM will actually parse in, throw it in error if it's not a number, right? Like, if it's gonna do it. So just let it do it. It's, it's technically the better way to do it. So lean into it using those annoying practices that it just does by default. Yeah, this is just like some specific things you could have slash commands and other things. Okay. So one of the things I want to talk about and I wanna focus on is when you use LLMs, there is two parts to it. There's the work that it does to get to a point, and then there's the results you want to focus on the result. And you don't, the, the, how it got there is not so important. So you should audit the result and if the result is code that you are going to put in your project, you need to look at that. If the result is a script, if, if, if your task is to rename all the files, you don't care how nice that script looks. And if anything like that, it doesn't matter what it is that intermediary steps. So don't focus on it, but you should focus it if it does matter. ### Practical Examples of Artifacts I'll give a couple of examples. If I go back a couple of slides, this thing over here, this is a different window. This is an Ask Asima, I don't know, pronounce that. It basically ask, ask Asima, I think. Is it right over basically it's a screen recording in asci, so it is not, this is not a video. This is, I could actually highlight. Oh, you can't see. Can you see that? Yeah, you see, you, you're gonna see me highlighting, 'cause okay, I have to go to this browser over here. You see? I am, I could actually copy, you know, I could select a text. This is a video essentially in a slide deck, which I also, I've never used it before. This is slide dev. It's a view project. It's really, really cool. But if you have, if you've ever seen like a a slide deck with a video embed bed in it, you have to click the video and then, you know. To, you have to unfocus from the thing and then go to the video and then full screen the video, things like that. One of the tasks I had this do was to sync, so I have two screens here. This is the screen that you're looking at, and I'm looking at the screen with the notes. I'm pressing play on the screen with the notes. It is syncing to the screen that you're looking at. It is doing that somehow because I asked Claude to make that work. Right? And Claude is doing that because Claude created a V plugin that opens a. Serv a socket, a web server a web socket server that communi that keeps a two-way communication in there. Would I have written it that way? Probably not. But I don't care because I care about the result that I can now sync my slides between two different windows, right? So that is an example of something that it's really good at, and I don't care how it got there, as long as it got, it gets the job done. That's it. If the result was that I needed to now check in this code. I would look at the code and now I care about that result. The artifact is important. The point to get to the artifact is not important. That's kind of, that's the point I wanna make here. ### Sub-Optimal Experiments Okay. So I wanted to touch on things that I thought would actually help but actually did not. And this could just be because LMS got better and some of these things were just not needed. So I had two different ways of doing playwright and actually I had a third way. I also tried. Playwright has CLI mode, so playwright, CLI is a package that is not the playwright MCP and they're, I think they're trying to replace the MCP and I had Claude run an experiment to see if it actually is better 'cause they claim it is. And Claude said, it's actually not better. And I had a different concept with playwright scripts where it should, I want my projects to, instead of using the playwright MCP, which is very expensive and hard to make. Deterministic of use cases, deterministic tests for different things. Like if I, if I'm able to create a bug because it follows specific amount of steps, getting it to follow the same exact steps without it, having to write a script to do it is actually really hard. So write the script to try to repro the, the bug and then fix it, and then make sure the script pass that's go, I call that a throwaway test, a throwaway playwright playwright script. So. Actually I thought those would be a lot more efficient. They're actually not because that back and forth that Clara has of just being in the loop as, as playwright is running, is really powerful. So that was surprising. *How do you take care of the problem of non-consistent steps taken?* So I had that when I had like a race condition when I have a net request loading a filter for a dropdown, and when that loaded fresh data, it would unset the state. So that was one of the few times where I needed like speed there to be able to beat the race condition. It always took three seconds to load, but the MCP is not necessarily faster, but a script that is loading the page, clicking it, and then the thing's gonna load, that's always gonna happen. *How does use the MMP by just sending it command, I guess, or whatever. Mm-hmm. So how do you make sure that the next time it runs it, it runs the same exact.* So for, for those type of things, then you would write a script. I would tell it to write a playwright script that loads the page and you know, repos the bug and then fix it. Once you have that script in place that you're able to repro. One of the things I did is I got a bug report that I couldn't repro and I said, and it was like almost the same, along the same lines. Like there's this, you open this field and this other thing goes in and the thing closes. And I said, I can't repro it. Can you try? And it said no, I wasn't able to, I told to write a script to do it, and then I said it could be something about navigation's happening also. Maybe they're hitting like the, the, the, the hash state is changing or something. Play around with that. That's all I, that's all I told it and it was able to repro it eventually, which is like shocking. And I was able to fix it. I was able to fix it with that. So that is I was really impressed with it, the ability to do that. And that would be a script that is, a very specialized use case of a browser. Most of the time it's load the page, click the thing, see if it works, and then if it doesn't, why didn't it work? I don't have to, it, it's not really that hard, you know, like the data didn't load. Why didn't the data load? Because you know, you have the wrong path, you had something else. So that's usually what it comes down to. That's most of the time that's what it is. And, you know Claude itself was recognizing that that is 90% of the time, 10% of the time, you will need. You know, one off scripts, but most of the time not. ### The Claude File and Team Execution Okay. The Claude file, if you have a specific task, like the three levels of clotting that I was describing before, that is really good with having with a Claude file, but don't be too specific, don't force it into the solution. Describe the problem, let it explore it. One of the things I always say is experiment. Try things out. Do web searches, try different experiments, create subagents to you know, just research things and you know, when you're ready, come after with an answer. And I don't even look at it anymore. I'll look at it length, 20 minutes. *I often understand if clo macro, what, what is the benefit of the MD farm?* If Claude could read your code. *Yeah, you can see all the, the whole project. So,* If, if you could look at all the code and understand your project, that's pretty good. That's when you have 400 files and you have all these different modules that do different things and you have all this complexity, it's *no, when I want to do one thing, you have this 10 files, let's say, and it need to focus on the, on this 10 files. So how does it know which 10 files. He won't know to look.* How do you know where to look? It's you need authentications, so you have to go into the off middleware. And then you also need you know, the, the graph QL endpoint. You have to look at the resolver and, you know, yeah, I would say there's a lot of different it's, it, you have all the context, so for you. Only having no 10 files, but you also have to know 30 other files that you didn't think matter, but you, since you know them already, you didn't think about it. But it's, you need all that kind. If you were to sit down a new person and tell him to do that, he would have to know all 50 files. So that's what a cloud file for something that able to describe the overall project and the overall structure of how things work. So yeah, that's but you know, you could have it explore the code base. Very extensively to create a, a file and to iterate that way. That's a decent starting point. Another thing Claude, and this is a Claude code, is Claude minus P. That's like the, the minus minus print thing. It's really bad with piping and for streaming results. And it's to the point of where it's useless If you want to do anything that's not, that, that's more iterative and less. Iterable, I would say. So yeah, that's just something to keep in mind. I've essentially created a MCP tool to be able to have Claude run Claude, and it runs interactively. It's, it's called Team er, and it just basically runs. Clawed with the cl, you know, into in ux and then it's able to send keys to it and it, like, it literally types into the terminal session and it sees the results and things like that. So that was, there was no other way to do it really, because it's, the, the minus P flag just doesn't keep output until the very end. And like, you wanna be able to see intermediate steps because you want to, you don't want it to like, you know, spend 10 minutes and go on the wrong solution and then finally find out. So stuff like that is helpful. ### Creating the Slide Deck All these demos were created essentially with that. I had, I basically had clo code write a record sh file, and that uses tux to be able to generate all these things. And I, you know, I said the text is too small, make it bigger. I'm giving a demo and it would change the, the columns and the rows make it. Then I said, oh, that looks terrible in a browser when I, for the slides fix it, and it opened the browser and saw what looks wrong and fixed it. It was, it was a little bit more back and forth than that, but essentially. I didn't write this slide deck and the slide deck saw code, so, but the result is the slide deck. That's all I care about. I don't care about how the code looks. So, you know, that's just the idea of, of how to frame these type of problems and how to view LM usage in general. ### Q&A Okay. And those third, my slides and questions. *I've never used Claude because I only recently started actually spending money on ai, so I used strategy bt, but I found two points you mentioned. That I had a lot of trouble with is number one, if I had like a complex file and I need a test written, I found if I gave it to Chad and said, Hey, I need to cover, let's say 99% of the code, you know, and here are the steps. It would mock like almost everything. Mm-hmm. I tell don't mock it. You're not actually covering any code here. Oh, no problem. I'll mock it and then we'll mock the mo. Like how do you get it to stop* you? You have to give an example. The test tests are like one of the main places I see where you need to get it started. You give it like one or two tests and say, this is the, the structure and the flavor of the test that I want you to write. And you can say like, this is a very placeholder type of test and like, I need you to, to, you know, flesh out more and to, to do more, more steps and more cases and more tests and things like that. But this level of mocking or this level of interactivity and things like that. And it's really good about that. Another thing you mentioned, you said about typing, just let it do its thing because it's gonna do it, you know, properly, even if we don't necessarily care about that. I've had that a lot and I've also had. Where like I'll say, okay, you know, just get rid of this function instead do it this way. And it would actually implement a ton of code for backwards compatibility that I don't care about. Is that like also, is that part of the same thing? Like I notice it's along the same lines where it either tries making sure all the types match or, and it'll also try making sure all of my functions match even if I'm not using them anymore? Yeah, you could just, you'd have to like make that part of the prompt and you know, you could yell at it, say, you know, this is not what I'm looking for. It'll learn that over time, the to be more pragmatic and less defensive essentially. *And then just one more question. I, it was triggered by one of your points that I think you might've accidentally skipped over, although I'm not sure it said long sessions versus short sessions. I don't know if that's specifically Alaw thing, but it triggered this question is like, I'll see if I'm having a conversation with it about, you know, longer scripts, sometimes it's, often it, I get much better results. By just coming, taking the last output, deleting that conversation, starting fresh. Mm-hmm. Because I feel like it send everything, it gets stuck in a loop. Yeah. Because it's reading the entire conversation. Is that normal or?* Yeah. So Claude actually has, you could fork a conversation. You can clear context when you're doing that. There are different ways of doing that. I don't know what's happening right now. But there, there are different techniques for that, but it's, it's a known problem. Four six is much better about that. Okay. Especially the way compact conversations. Usually when you compact the conversation in four, five, and previous ones, it was like, it was over, like you're trying to get the job done before it compacts it, because at that point you're never gonna recover. So I'll have to look into plug, but yeah, 4, 4 6 is very good about that. I did gloss over that point. The point of longer short conversations there are like two different schools of thought when it comes to working on features. One is to have Claude like write your whole project. The other is to tell Claude to give Claude a task to do. That and you're going to kind of put those pieces together or have another club put piece together. I don't think the first way is like a, a, a usable approach. I think that's how we get like major slop. I think if you were involved in the process, that is how you get good results from it. So that is good catch. I didn't mean to speak about that. *I have a question. So. One of the things that's been annoying me is like, you'll write way code, like he was saying. Mm-hmm. And then I'm like, okay, delete the code. And it goes, no problem. And it leaves all the comments run to itself. You know what I'm talking about? Yeah. It won't remove its own comments. We'll just, yeah. You have me able to solve that at all?* Yeah. So in Claude, you could revert the conversation state. You could revert the comments, you could revert the code. *Just delete everything. And like this whole file, you don't need, delete most of the file. Then it leaves the comments. For itself for later in case it needs it. Yeah. Yeah, you just said, I mean, that's what I read. That's an L lism, you know, so it puts in code common. Everything he told us not to do. Right? So says we doing this, not this and not this. I have, right? So one of you told me how to do this.* One of the things I have in the cloth file is to look at the existing code base to understand the level of comments to add. So like, don't uncle Bob level the comments and don't you know, take the other extreme, you know, like kind of a, a good, happy medium. Like the coach should explain when something weird is happening or why it's happening, things like that. Then. You know, when it makes sense and, you know, it's only if you account I have to use. Well, yeah. I mean, you, you, at some point you have your own code before ELs existed, hopefully. *Is this still important in leaving comments beyond, For yourself or should you leave like the actual plan, plan files?* I don't, I don't use plan files. I just have, I, I, I prompt it and get it, give it to you, have some *plan files. I, I don't, I get rid of them. Yeah. But there are people, I've seen people that just commit them.* Yeah. I think that, I think that's slop. I find it slop because it's just like, there's no point to it. It's just, it just accumulates and it is just cruft That never goes away. And you know, you need, you need plans from three years ago when none of that code still exists. *When you blame a piece of code and you go back, you find the VR and you find it, it's, it's documentation that goes outta sync. It's, it's, it's a, it's a type of, you find the blame and you know why it was done by the time it was done.* Yeah. But that's, it's like, that's too many steps for, it's like *good pr, descrip. You can always get back from the blame. That's what the plan is. It's not for perpetuity. It's when you go legal.* I know, but it just leaves a thousand plan files and it's just put it in the PR. It's easier. Okay. I think, oh, let's let's take comments offline, I guess, and I guess we'll yeah.

Precise talent for your
teams needs

Open Jobs

About

Get Hired

Hire Talent

© Copyright 2024, All Rights Reserved

Privacy Policy

Terms & Conditions

Precise talent for your
teams needs

Open Jobs

About

Get Hired

Hire Talent

© Copyright 2024, All Rights Reserved

Privacy Policy

Terms & Conditions

Precise talent for your teams needs

Open Jobs

About

Get Hired

Hire Talent

© Copyright 2024, All Rights Reserved

Privacy Policy

Terms & Conditions