VueJS

Detect Hand Sign Languages with Tensorflow

Published Jul 12, 2023

Updated Oct 12, 2023

7 min read

This article was written over 18 months ago and may contain information that is out of date. Some content may be relevant but please refer to the relevant official documentation or available resources for the latest information.

Interested in learning how to use Tensorflow to detect hand sign languages in your apps? By the end of this read, you will know how to implement Tensorflow in your application with very simple steps. In our example today, we will be using Vue.

What is Tensorflow?

Tensorflow is an end-to-end platform (meaning: delivering complex systems or services in functional form after developing it from beginning to end.) used for building Machine Learning applications, and it is also open-source. TensorFlow enables you to build dataflow graphs and structures to define how data moves through a graph by taking inputs as a multi-dimensional array called Tensor. You can read more on Tensorflow here.

What is a Model?

A model is a function with learnable parameters that maps an input to an output. A well-trained model will provide an accurate mapping from the input to the desired output.

Tensorflow Models

Tensorflow models are pre-trained models, and there are four defined categories of them:

Vision: Analyze features in images and videos.
Body: Detect key points and poses on the face, hands, and body with models from MediPipe
Text: Enable NLP in your web app using the power of BERT and other Transformer encoder architectures.
Audio: Classify audio to detect sounds.

If you want to go into more detail, check out Tensorflow Models.

All these models are broken down into subs and for our case, we will be making use of the Body Model which has the hand pose detection we need in order to detect the hand signs.

Hand Pose Detection

This model used a 2D and 3D multi-dimensional array which enables it to predict the keypoints of the hands.

Example of a 2D is [[1,2],[3,5],[7,8],[20,44]] and that of a 3D is [[1,2,5],[3,5,8],[7,8,6],[20,44,100]].

This hand pose detection is a model from the MediPipe as we established above, and it provides us with two model types which are lite and full. The accuracy of the prediction increases from lite to full while the inference speed reduces, i.e. the response time will be slower as the accuracy increases.

What do we need?

There are a few dependencies we need to get things working, and I also will be assuming that you have your project set up as well.

You will need to add these dependencies to the project


yarn add @tensorflow-models/hand-pose-detection

# Run the below commands if you want to use TF.js runtime.
yarn add @tensorflow/tfjs-core @tensorflow/tfjs-converter
yarn add @tensorflow/tfjs-backend-webgl

# Run the below commands if you want to use MediaPipe runtime.
yarn add @mediapipe/hands

yarn add fingerpose

Above, in the commands, you will notice we added a fingerpose. Let's talk a little about what we need the figerpose for.

Fingerpose

Fingerpose is a gesture classifier for hand landmarks detected by Mediapipe hand pose detection. It also allows you to add your own hand gesture, which means that a gesture that signifies the letter Z can signify Hello based on your fingerpose data. We will see an example of how the data looks in a bit. You can check out fingerpose for more details.

Get started

We are going to use Vue for this illustration. We will start by looking at the HTML first, and then we will cover the JavaScript.

Our Template will be a basic HTML that will have a video tag so we can show a video after getting access to our webcam.

Template

<template>
  <div class="wrapper">
    <video
      ref="videoCam"
      class="peer-video"
      preload="auto"
      autoPlay
      muted
      playsInline
    />
  </div>
</template>

The snippet above shows a div and a video tab. The video is used when we gain access to the webcam.

We will now be writing the JS required to initialize the webcam.

Script


<script  setup>
import { onMounted, ref } from "vue";

const videoCam = ref();
function openCam() {
  let all_mediaDevices = navigator.mediaDevices;

  if (!all_mediaDevices || !all_mediaDevices.getUserMedia) {
    console.log("getUserMedia() not supported.");
    return;
  }
  all_mediaDevices.getUserMedia({
    audio: true,
    video: true,

  })
    .then(function (vidStream) {
      if ("srcObject" in videoCam.value) {
        videoCam.value.srcObject = vidStream;
      } else {
        videoCam.value.src = window.URL.createObjectURL(vidStream);
      }
      videoCam.value.onloadedmetadata = function () {
        videoCam.value.play();
      };
    })
    .catch(function (e) {
      console.log(e.name + ": " + e.message);
    });
}
onMounted(() => {
  openCam();
});
</script>

We imported two methods from vue: onMounted and ref. The onMounted runs when the page is fully mounted while the ref is used to declare a reactive value to reference the video element. If you look at the video tag in the template, you will notice a ref property. You can check out Template ref and onMounted lifecycle hook.

In the openCam function, we first try to test if mediaDevices is available on your browser navigation.

The MediaDevices interface provides access to connected media input devices like cameras and microphones, as well as screen sharing. In essence, it lets you obtain access to any hardware source of media data.

This MediaDevice has a method getUserMedia which prompts the user for permission to use a media input. You can find all you need to know about getUserMedia here.

From the snippet, we can see that getUserMedia returns a promise, and with that, we can get the media stream as a response using then(). We check if the video element has srcObject or not. If it does we assign the media stream to the srcObject and if not, we convert the media stream to a URL and assign it to the src of the video element.

With this Snippet and with a few style, you should have your video showing your awesome face!

Introducing Tensorflow and Hand Detection

Now that we got our webcam working, we will update the Template and the script in order to detect, predict, and display the alphabet based on the hand sign prediction.

The updated HTML should now look like this:

<template>
  <div class="wrapper">
    <video
      ref="videoCam"
      class="peer-video"
      preload="auto"
      autoPlay
      muted
      playsInline
    />
    <div class="alphabet">{{ sign }}</div>
  </div>
</template>

The div with class name alphabet will display the alphabet based on the hand sign prediction.

We will be introducing two(2) new functions, createDetectionInstance and handleSignDetection.

Firstly, lets begin with the createDetectionInstance which is an integral part of the hand sign detection and then we will introduce handleSignDetection which predicts and displays the hand sign.


<script setup>
import { onMounted, ref } from "vue";
import * as handPoseDetection from "@tensorflow-models/hand-pose-detection";

let detector;
const videoCam = ref();

function openCam() {
   …
}

const createDetectionInstance = async () => {
  const model = handPoseDetection.SupportedModels.MediaPipeHands;
  const detectorConfig = {
    runtime: "mediapipe",
    modelType: "lite",
    solutionPath: "https://cdn.jsdelivr.net/npm/@mediapipe/hands/",
  } as const;
  detector = await handPoseDetection.createDetector(model, detectorConfig);
};

onMounted(async () => {
  openCam();
  await createDetectionInstance();
});
</script>

To be able to detect hand poses, we need to create an instance of the handpose detector, and here, we created a function createDetectionInstance which is an asynchronous function.

You can check out this Tensorflow blog to see more details.

Now that we have created an avenue to detect hand signs, let us start detecting the hand.

In that light, we will be adding a handleSignDetection function.


<script setup>
import { onMounted, ref } from "vue";
import * as handPoseDetection from "@tensorflow-models/hand-pose-detection";

let detector;
const videoCam = ref();

function openCam() {
  …
}

const createDetectionInstance = async () => {
  const model = handPoseDetection.SupportedModels.MediaPipeHands;
  const detectorConfig = {
    runtime: "mediapipe",
    modelType: "lite",
    solutionPath: "https://cdn.jsdelivr.net/npm/@mediapipe/hands/",
  } as const;
  detector = await handPoseDetection.createDetector(model, detectorConfig);
};

const handleSignDetection = () => {
  if (!videoCam.value || !detector) return;
  setInterval(async () => {
    const hands = await detector.estimateHands(videoCam.value);
    if (hands.length > 0) {
      console.log(hands)
    }
  }, 2000);
};
onMounted(async () => {
  openCam();
  await createDetectionInstance();
  handleSignDetection();
});
</script>

The handleSignDetection runs after creating the detection instance. We have a setInterval that runs every 2 seconds (PS: the 2 seconds timing is arbitrary and can be less or more) to check if there is any hand sign. We also have a conditional statement to ensure the video element exists, and the detection instance was created accordingly.

So, the detector calls a method estimateHands, which tries to predict the hand pose by getting keypoints with values that are either in 2D or 3D (Multi-dimensional Array).

If you check your console log, you will see an array of data if any hand pose is detected.

Now that we can detect hand poses, we will now add fingerpose that will help predict and display the alphabet based on the hand sign.


<script setup>
import { onMounted, ref } from "vue";
import * as handPoseDetection from "@tensorflow-models/hand-pose-detection";
import * as fp from "fingerpose";
import Handsigns from "@/utils/handsigns";

let detector;
const videoCam = ref();
let sign = ref(null);

function openCam() {
  …
}

const createDetectionInstance = async () => {
  const model = handPoseDetection.SupportedModels.MediaPipeHands;
  const detectorConfig = {
    runtime: "mediapipe",
    modelType: "lite",
    solutionPath: "https://cdn.jsdelivr.net/npm/@mediapipe/hands/",
  } as const;
  detector = await handPoseDetection.createDetector(model, detectorConfig);
};

const handleSignDetection = () => {
  if (!videoCam.value || !detector) return;
  setInterval(async () => {
    const hands = await detector.estimateHands(videoCam.value);
    if (hands.length > 0) {
      const GE = new fp.GestureEstimator([
        fp.Gestures.ThumbsUpGesture,
        Handsigns.aSign,
        Handsigns.bSign,
        Handsigns.cSign,
        …
        Handsigns.zSign,
      ]);

      const landmark = hands[0].keypoints3D.map(
        (value) => [
          value.x,
          value.y,
          value.z,
        ]
      );
      const estimatedGestures = await GE.estimate(landmark, 6.5);

      if (estimatedGestures.gestures && estimatedGestures.gestures.length > 0) {
        const confidence = estimatedGestures.gestures.map((p) => p.score);
        const maxConfidence = confidence.indexOf(
          Math.max.apply(undefined, confidence)
        );

        sign.value = estimatedGestures.gestures[maxConfidence].name
      }
    }
  }, 2000);
};
onMounted(async () => {
  openCam();
  await createDetectionInstance();
  handleSignDetection();
});
</script>

Assuming that our detector sensed a hand, it is time to match this value based on the hand signs we created with the fingerpose.

The landmark variable is a 3D array pulled from the hand result's keypoint3D key value. There is also a keypoint as well, which is a 2D value, and both will give the same result.

Now, using GE.estimate, we can generate a possible gesture that matches the sign, and a score/confidence is assigned to each gesture pending the amount of gesture predicted. So, the gesture with the highest score/confidence is selected since it is estimated to be the closest to the hand sign from the figerpose hand signs we created.

We also imported Handsigns and its content looks like this:

You can also get the handsigns folder from the 100-ms-vue repository. Looking at the screenshot, there is a GestureDescription instance that takes a string A which will represent what the hand sign will stand for. So, it could be anything you want the handsign to stand for.

onMounted is asynchronous because we need to ensure that our detection instance is created, which is required to detect the hand sign.

With the updated code, you should be able to display some letters.

Conclusion

Don't forget, you can see in detail how this was implemented in one of This Dot Labs' open source projects 100-ms-vue. Please note that what we did is just a basic implementation, and to have a production-ready version, it will need a bigger model, and a more complex detection to be able to identify hand sign language.

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

About the author(s)

Jerry Hogan
Passionate developer, super fan of Vue, a singer, writer and musician.
@iamjerry_hogan @hdjerry

10 Mobile AR Apps That Are Creating the “New Normal” for Your Customers

As developers and businesses leaders, we know that augmented reality is no longer a far-flung futuristic concept. In fact, a 2019 Perkins Coie survey revealed that roughly 90% of tech leaders and consultants believe that the use of immersive technologies will be as ubiquitous as mobile devices by 2025. But more intriguing than this, is the diversity in opinions from respondents when asked which sectors they believed would most invest in AR/VR technologies. Not surprisingly, 54% include gaming in their top picks. But that narrow majority is rivaled by 43% of respondents who include healthcare, and at least 20% who felt strongly about a number of other industries, including education, military and defense, and manufacturing/automotive. Though the use of AR technologies is expected to explode across the consumer user-base, only 22% of consumers, in a 2019 ARtillery Intelligence survey, were aware of having used mobile AR technologies themselves. Now it may just be one dev’s opinion, but I believe that this number does not accurately reflect the true percentage of those surveyed who have actually used AR- just the percentage that KNOW they’ve used AR. But is this number important? I think so. I imagine that the unknown percentage that claimed not to have used mobile AR apps, but who have unknowingly used them, likely just stumbled upon them, not drawn to the fact that the applications utilize this transformative technology, and being completely oblivious that they used AR. This unknown should terrify, or motivate executives, because this may mean that AR is so quickly and seamlessly integrating with other mobile technologies, that the change is completely inarticulable at best, and unnoticed at worst, by the average consumer. In other words, AR may be quickly becoming the new normal, with unintegrated technologies possibly being seen by consumers not as a different class of technologies, but, perhaps, as an “inferior” technology. So let’s check out 10 mobile apps that are already preparing your future customers and users for the “new normal”: 1) Wecasper Launched in 2015, Wecasper is a mobile social media application that allows its users to store multimedia content in geolocated containers, almost like time capsules, aptly called “caps”. But not only does this app allow users to interact with certain content while in particular locations, it also lets users search for, and see the caps in their environments through their mobile screens! 2) WallaMe Not so unlike Wecasper, WallaMe have given users the ability to hide messages in the real world, and share those messages with their friends, and the general public. Users can draw pictures, or paint messages on physical walls around them- almost like virtual graffiti- and then alert their friends about the message’s location! Friends are then able to see the messages they leave for each other, and even share those messages with other users!! 3) Pottery Barn’s 3D Room View App Ever been on the verge of buying that stylish chair, or perfect coffee table, but just don’t know how it would look in your living room? Pottery Barn is one of a number of furniture retailers, including my favorite, Ikea, that have launched AR powered retail platforms. With their 3D Room View App, users are now able to place spatially accurate renderings of Pottery Barn’s catalog of furniture to see how the pieces fit their rooms. Customers never need to doubt their online purchases, and Pottery Barn will undoubtedly see an increase in engagement with their ecommerce platforms. 4) Sherwin William’s ColorSnap Visualizer It seems Pottery Barn isn’t the only home decor and improvement company that has caught the AR bug. Industrial paint company, Sherwin Williams is giving their customers the ability to try out colors without having to pick up swatches, or dab unsightly (and let’s be honest, unhelpful) brush strokes on to their walls. Instead, customers can actually project colors onto their very own walls through a mobile interface, giving them the ability to try out hundreds of different colors in just minutes. 5) Star Walk 2 Developed by Vito Technologies, Star Walk 2 is a fun educational application that guides users through the night’s sky. All a user has to do is point their phone toward the sky, and the app’s AR feature will not only tell the user what stars, constellations, and other astronomical features are above, but will show them as well! No more guessing at what constellations are what, or if that little light above is a star or an airplane- users can trust Sky Walk 2’s AR feature to tell them everything they want to know, and probably a little more! Sappy high school dates will never be the same! 6) MeasureKit Now I know that I’m not the only one who is CONSTANTLY misplacing the one measuring tape we keep around the house. That’s where MeasureKit swoops in for the rescue. MeasureKit uses advanced AR technologies to allow its users to quickly and accurately measure lengths, distances, trajectories, and angles with nothing more than the camera on their mobile devices. I would download this is a second if it didn’t make me so afraid of misplacing my Smart Phone too! 7) The Safety Compass Now let’s take a quick diversion from consumer apps, and look at a mobile app being used to help keep industrial work sites safer for employees. According to the app’s publisher, an employee dies on a worksite every 15 seconds, globally. The Safety Compass is hoping to decrease that number significantly with its mobile app that gives workers the ability to scan their worksites for potential hazards, and shows them relevant information on standard safety protocols with which to approach the hazards. This takes the guess-work out of not only whether something is hazardous, but what that hazard is called, equipping workers with relevant information that will help keep them safe. 8) Janus Health AR Janus Health is changing the future of dentistry and orthodontics with its mobile compatible AR application. The app scans a patient’s teeth, and shows them, in real-time, what their teeth could look like after a procedure. This not only boosts patient confidence, but may also expediate the consultation process for providers, and increase patient engagement with the practice. I wonder if the app can make your teeth look worse too? Maybe if my orthodontist had this app when I was a teenager, I would have actually worn my retainer! 9) Google Lens Google Lens, for Android (sorry iOS users), brings the power of advanced image recognition software to the palm of its user’s hand. Have you ever taken a walk through the park, and wanted to identify what kind of tree or flower is in front of you? Now all you need to do is point your mobile camera toward it, and Google Lens will identify it for you! But that’s not all it does! Google advertises that users can also translate text from signs or packages, scan restaurant menus to find pictures and reviews of the items, and even find out more information about landmarks just by pointing their mobile cameras! 10) Quiver Quiver is an AR enhanced edutainment mobile app that helps bring kids’ drawings to life! Users simply need to print off the pages offered through the company’s web platform or app, color in the photos, and use a mobile device to see lifelike 3-D renderings of the pictures right in front of them! Quiver has also expanded, releasing new product lines, including a version of their app that provides all of the fun of coloring, and AR, with a greater focus on education! SO WHAT'S THE TAKE AWAY? According to Perkins Coie, the North American market is expected to see the most significant growth in AR investment over the next 5 years. While the language of AR is not necessarily on the forefronts of consumers’ minds, the pressure to provide these more natural connections between the digital world and consumers will increase as more users come to expect AR integrated application functionalities. Now is a perfect time to invest in AR technology! Whether you are looking to engage more with your customers, improve the lives of your employees, or simply create a product that will enrich users' lives through play and learning, there is space for AR integrated mobile applications within your company. By starting your transformation now, you are affording your company the opportunity to identify the areas of your business that can best be enhanced by this and other transformative technologies, and make mindful, well considered steps to developing these technologies for yourself. Do not wait until the pressure from your competitors eventually forces you to hurry a product to market. Begin the process now, and you will thank yourself later. Don’t know how to start? Contact This Dot Labs to speak with leaders in enterprise-level digital transformation. This Dot Labs is a web development firm that helps some of the world’s leading enterprises, including American Express, ING, Groupon, and many others, reach their development goals through consulting, mentorship, and training. With our diverse team of seasoned senior developers and mentors, we can help you harness the power of advanced modern tools to keep you ahead of the AR curve!...

Jan 7, 2020

7 mins

AIMachine Learning

Image Text/Face recognition with AWS Rekognition👀

AWS Rekognition What is AWS Rekognition? Rekognition is a AWS service that provides deep learning visual analysis for your images. Rekognition is very easy to integrate into your application by providing an image or video to the AWS Rekognition API. The service will identify some following: objects, people, text, scenes, and activities. "Amazon Rekognition also provides highly accurate facial analysis and facial recognition. You can detect, analyze, and compare faces for a wide variety of use cases, including user verification, cataloging, people counting, and public safety." - *AWS Official Docs* Now let's start using AWS Rekognition Let's start with trying some of their demos to see how AWS Recognition works. 1) Go to the following link and play with the demos. Time to get our hands dirty Warning🚨 : 1) You need to have an AWS Management Console account. 2) It will ask you for your credit card info, but YOU won't be charged for what you use in this tutorial since it's part of the FREE TIER. Setting up our S3 Bucket 1) Go to Find services and look for S3 2) Click on CREATE A BUCKET 3) Enter the bucket name as thisdot-rk-YOUR_NAME 4) Click on NEXT twice 5) Uncheck all the boxes to grant public access to the bucket. Click NEXT *Note: I'm making this bucket public, because for the purpose of this tutorial, I'm not worried about security.* 6) Click on CREATE BUCKET. 7) Time to upload some images to S3. Click on the bucket you just created. 8) Download this image and save it as thisdot.png. https://thisdot-rk-pato.s3.amazonaws.com/thisdot.png 9) Click on the dropdown from Mange Permissions, and click on Grant public read access to this object(s). 10) Click NEXT 11) On the Storage Class, select Standard, and click NEXT 12) Click UPLOAD Setting up our Lambda Function 1) Go to Find services, and look for Lambda 2) Click on CREATE FUNCTION 3) For Lambda function name, enter thisdot-rk-YOUR_NAME 4) Under Runtime, click on the dropdown, and select Python 3.7. 5) Click on CREATE FUNCTION 6)Scroll down to where it says Function code. You should see something like this: 8)Erase everything you see in the editor, and paste the following code in there: *Note: Change the name of the bucket to the bucket name you created thisdot-rk-YOUR_NAME.* The following code is going to help us finding the text inside of the images. Using the *.detect_text* method. ` *Note: To Learn more about other AWS Boto Rekognition functions, visit this website.* 9) Scroll down to change the BASIC SETTINGS< of the lambda. 10) Change Memory to 512MB, and Timeout to 2min 30sec. This is to ensure your lambda doesn't timeout when processing images. 11) Scroll all the way to the top. In the upper right corner, you should see the SAVE button. Click on it. Setting up our Security Roles Using IAM 1) Search for the IAM Service (Services IAM) 2) On the left navigation bar, click on ROLES. 3) You can select any lambda you have created to give it a specific role. In this tutorial, we will select the following to give it access to AWS Rekognition. 4) Then click on ATTACH POLICIES 5) Search for rekognition 6) Select AmazonRekognitionFUllAccess 7) Click on ATTACH POLICY *Note: You can have multiple policies attached* Time to Test 1) Go back to your lambda function. 2) In the top right corner, select the dropdown that says "Select a test event" 3) Then select "Configure test events" 4) Give a name to your event 5) Then enter the following JSON object ` Where "thisdot.png" is the name of your image inside of your S3 bucket. 6) Click CREATE 7) On Your top right corner, you will see the TEST button. Click on it. 8) You should see a green box. Click on expand details. 9) Take a look at the response object. As you can see, it has found our text inside of the image. Awesome!!!, right? Now, let's jump to compare faces. Imagine you want to see if one person appears in the same picture. Rekognition can do this. Imagine all the possibilities! 1) Download these 2 images. Source Image https://thepracticaldev.s3.amazonaws.com/i/ktpt1lx1ubzt3ilupph7.jpg Target Image https://thepracticaldev.s3.amazonaws.com/i/p5j8z6hiey8z8rkspms6.jpg 2) Upload them to S3, following the steps previously mentioned. 3) Go back to your lambda, and create a new test, or edit the existing test. Your test will look like this: ` 4) Then, we are going to modify our lambda code to be able to compare faces ` 5) Look at the Execution results, and analyze the data. Tell me what you think about this tutorial on twitter or comment below!...

Oct 31, 2019

6 mins

Machine LearningAWS

Mocking REST API in Unit Test Using MSW

Unit testing has become an integral part of application development, and in this discussion, we will be talking about mocking API while unit testing using MSW, a tool used for mocking APIs. We also won't be setting the test environment up since we expect you have already done so, and also we will be using a React project for our testing. Please note: the setup will work for any framework of your choice. What is MSW? MSW (Mock Service Worker) is an API mocking library that uses Service Worker API to intercept actual requests. Why Mock? Well, it avoids us making an actual HTTP request by leveraging on mock server, and a service worker which, in turn, prevents any form of break should something go wrong with the server you would have sent a request to. What is a Mock Server? A mock server imitates a real API server by returning the mock API responses to the API requests. This will help intercept the API request. You can read more here. Project Setup We will be installing few packges, and doing a little configuration. Installations We will be installing the following - msw: This package helps use to mock the API request and also helps us set up the server. - whatwg-fetch: Not always required, but what it does is it makes fetch available if you are useing the fetch API. Sometimes you get this: ` Creating the mock config Lets create a mock folder in our src folder to keep things organized. We want to have two files in the mock folder in this example: - postHandler.js: Here, we will have the handler that will help with intercepting the API. - serverSetup.js: Here we will set up or server that will house the handler. postHandler.js content Its important that the API url matches the url you intend to intercept: ` Lets explain the above code: posts: This is dummy data that is going to serve as our response data knowing full well what the data structure might look like. postHandler: This will serve as the handler, which will help in intercepting the request made to that url. serverSetUp.js content ` From the above code, the setupServer can take in any number of handlers separated by comma. The setupServer serves as our "fake" server where we can add as many handlers to intercept requests. Component Lets create a component for the purpose of our test. Here, I will be making use of the App.js file. My code will look like this: ` All we are doing is fetching some data from the URL when the component mounts and saving the data in our state to display them. And we are using the jsonplacehoder url to fetch posts. Test the component We will create a test file named App.spec.js with this contents ` We imported the component we want to test with: the whatwg-fetch to prevent us with the possible error we talked about. Then, we imported our server. There are few methods above: - beforeAll: This runs before all tests are executed. - afterAll: The opposite of beforeAll. - afterEach: As the name implies, it runs after each test. - resetHandler: It is a useful clean up mechanism between multiple test suites that leverage runtime request handlers. If you added another handler during a test, this resetHandler will remove that handler so that the next test will not know about it. You can see an example of reset handler. Though, in our code example, we didn't need it. But its important to know we have such control and power over what we want to do. So, in summary before all listen to the server for any request, after all close the server and after each test reset handler. On app render there is an API call, but our server will intercept it, so at first the loading text will exist. Then after, it won't. Will it work without intercepting? Yes it will. So, why do all of this? - Anything could happen to the API server, so you don't want you app test to fail. - The data could change, and we don't want a situation where the test fails because of data changes. Conclusion We have created a repo that you can test with in case you have issues following along. Please feel free to let us know what you think as we look forward to building a stronger community by sharing knowledge....

Dec 14, 2022

4 mins

TestingMocking

What Sets the Best Autonomous Coding Agents Apart?

Must-have Features of Coding Agents Autonomous coding agents are no longer experimental, they are becoming an integral part of modern development workflows, redefining how software is built and maintained. As models become more capable, agents have become easier to produce, leading to an explosion of options with varying depth and utility. Drawing insights from our experience using many agents, let's delve into the features that you'll absolutely want to get the best results. 1. Customizable System Prompts Custom agent modes, or roles, allow engineers to tailor the outputs to the desired results of their task. For instance, an agent can be set to operate in a "planning mode" focused on outlining development steps and gathering requirements, a "coding mode" optimized for generating and testing code, or a "documentation mode" emphasizing clarity and completeness of written artifacts. You might start with the off-the-shelf planning prompt, but you'll quickly want your own tailored version. Regardless of which modes are included out of the box, the ability to customize and extend them is critical. Agents must adapt to your unique workflows and prioritize what's important to your project. Without this flexibility, even well-designed defaults can fall short in real-world use. Engineers have preferences, and projects contain existing work. The best agents offer ways to communicate these preferences and decisions effectively. For example, 'pnpm' instead of 'npm' for package management, requiring the agent to seek root causes rather than offer temporary workarounds, or mandating that tests and linting must pass before a task is marked complete. Rules are a layer of control to accomplish this. Rules reinforce technical standards but also shape agent behavior to reflect project priorities and cultural norms. They inform the agent across contexts, think constraints, preferences, or directives that apply regardless of the task. Rules can encode things like style guidelines, risk tolerances, or communication boundaries. By shaping how the agent reasons and responds, rules ensure consistent alignment with desired outcomes. Roo code is an agent that makes great use of custom modes, and rules are ubiquitous across coding agents. These features form a meta-agent framework that allows engineers to construct the most effective agent for their unique project and workflow details. 2. Usage-based Pricing The best agents provide as much relevant information as possible to the model. They give transparency and control over what information is sent. This allows engineers to leverage their knowledge of the project to improve results. Being liberal with relevant information to the models is more expensive however, it also significantly improves results. The pricing model of some agents prioritizes fixed, predictable costs that include model fees. This creates an incentive to minimize the amount of information sent to the model in order to control costs. To get the most out of these tools, you’ve got to get the most out of models, which typically implies usage-based pricing. 3. Autonomous Workflows The way we accomplish work has phases. For example, creating tests and then making them pass, creating diagrams or plans, or reviewing work before submitting PRs. The best agents have mechanisms to facilitate these phases in an autonomous way. For the best results, each phase should have full use of a context window without watering down the main session's context. This should leverage your custom modes, which excel at each phase of your workflow. 4. Working in the Background The best agents are more effective at producing desired results and thus are able to be more autonomous. As agents become more autonomous, the ability to work in the background or work on multiple tasks at once becomes increasingly necessary to unlock their full potential. Agents that leverage local or cloud containers to perform work independently of IDEs or working copies on an engineer's machine further increase their utility. This allows engineers to focus on drafting plans and reviewing proposed changes, ultimately to work toward managing multiple tasks at once, overseeing their agent-powered workflows as if guiding a team. 5. Integrations with your Tools The Model Context Protocol (MCP) serves as a standardized interface, allowing agents to interact with your tools and data sources. The best agents seamlessly integrate with the platforms that engineers rely on, such as Confluence for documentation, Jira for tasks, and GitHub for source control and pull requests. These integrations ensure the agent can participate meaningfully across the full software development lifecycle. 6. Support for Multiple Model Providers Reliance on a single AI provider can be limiting. Top-tier agents support multiple providers, allowing teams to choose the best models for specific tasks. This flexibility enhances performance, the ability to use the latest and greatest, and also safeguards against potential downtimes or vendor-specific issues. Final Thoughts Selecting the right autonomous coding agent is a strategic decision. By prioritizing the features mentioned, technology leaders can adopt agents that can be tuned for their team's success. Tuning agents to projects and teams takes time, as does configuring the plumbing to integrate well with other systems. However, unlocking massive productivity gains is worth the squeeze. Models will become better and better, and the best agents capitalize on these improvements with little to no added effort. Set your organization and teams up to tap into the power of AI-enhanced engineering, and be more effective and more competitive....

Jun 3, 2025

4 mins

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Detect Hand Sign Languages with Tensorflow

What is Tensorflow?

What is a Model?

Tensorflow Models

Hand Pose Detection

What do we need?

Fingerpose

Get started

Template

Script

Introducing Tensorflow and Hand Detection

Conclusion

Jerry Hogan

You might also like

10 Mobile AR Apps That Are Creating the “New Normal” for Your Customers

Image Text/Face recognition with AWS Rekognition👀

Mocking REST API in Unit Test Using MSW

What Sets the Best Autonomous Coding Agents Apart?

Let's innovate together!

You might also like

10 Mobile AR Apps That Are Creating the “New Normal” for Your Customers

Image Text/Face recognition with AWS Rekognition👀

Mocking REST API in Unit Test Using MSW

What Sets the Best Autonomous Coding Agents Apart?