Javascript

Integrating AI Models Locally with Next.js ft. Jesus Padron

Published Sep 3, 2024

Updated Dec 11, 2024

1 min read

Jesus Padron from the This Dot team shows you how to integrate AI models into a Next.js application. Jesus walks through the process of running Meta's Llama 3.1 model locally, leveraging OpenAI's Whisper for speech-to-text conversion, and using OpenAI's TTS model for text-to-speech conversion. By the end of the episode, listeners will know how to create an AI voice assistant that processes voice input, understands the content, and responds audibly.

Chapters:

Introduction to the Episode (00:00:03)
Overview of Llama 3.1 and Setup (00:02:14)
Setting Up the Next.js Application (00:04:40)
Recording Audio with MediaRecorder API (00:11:37)
Integrating OpenAI's Whisper for Speech-to-Text (00:36:46)
Generating Responses with Llama 3.1 (00:48:24)
Implementing Text-to-Speech with OpenAI's TTS (01:03:26)
Final Testing and Demonstration (01:06:37)
Summary and Next Steps (01:09:01)
Closing Remarks (01:14:19)

Follow Jesus on Social Media Twitter: https://x.com/padron4497 Github: https://github.com/padron4497

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

Next.js + MongoDB Connection Storming

Building a Next.js application connected to MongoDB can feel like a match made in heaven. MongoDB stores all of its data as JSON objects, which don’t require transformation into JavaScript objects like relational SQL data does. However, when deploying your application to a serverless production environment such as Vercel, it is crucial to manage your database connections properly. If you encounter errors like these, you may be experiencing Connection Storming: * MongoServerSelectionError: connect ECONNREFUSED <IP_ADDRESS>:<PORT> * MongoNetworkError: failed to connect to server [<hostname>:<port>] on first connect * MongoTimeoutError: Server selection timed out after <x> ms * MongoTopologyClosedError: Topology is closed, please connect * Mongo Atlas: Connections % of configured limit has gone above 80 Connection storming occurs when your application has to mount a connection to Mongo for every serverless function or API endpoint call. Vercel executes your application’s code in a highly concurrent and isolated fashion. So, if you create new database connections on each request, your app might quickly exceed the connection limit of your database. We can leverage Vercel’s fluid compute model to keep our database connection objects warm across function invocations. Traditional serverless architecture was designed for quick, stateless web app transactions. Now, especially with the rise of LLM-oriented applications built with Next.js, interactions with applications are becoming more sequential. We just need to ensure that we assign our MongoDB connection to a global variable. Protip: Use global variables Vercel’s fluid compute model means all memory, including global constants like a MongoDB client, stays initialized between requests as long as the instance remains active. By assigning your MongoDB client to a global constant, you avoid redundant setup work and reduce the overhead of cold starts. This enables a more efficient approach to reusing connections for your application’s MongoDB client. The example below demonstrates how to retrieve an array of users from the users collection in MongoDB and either return them through an API request to /api/users or render them as an HTML list at the /users route. To support this, we initialize a global clientPromise variable that maintains the MongoDB connection across warm serverless executions, avoiding re-initialization on every request. ` Using this database connection in your API route code is easy: ` You can also use this database connection in your server-side rendered React components. ` In serverless environments like Vercel, managing database connections efficiently is key to avoiding connection storming. By reusing global variables and understanding the serverless execution model, you can ensure your Next.js app remains stable and performant....

Jul 11, 2025

3 mins

NextJSJavaScript

What Sets the Best Autonomous Coding Agents Apart?

Must-have Features of Coding Agents Autonomous coding agents are no longer experimental, they are becoming an integral part of modern development workflows, redefining how software is built and maintained. As models become more capable, agents have become easier to produce, leading to an explosion of options with varying depth and utility. Drawing insights from our experience using many agents, let's delve into the features that you'll absolutely want to get the best results. 1. Customizable System Prompts Custom agent modes, or roles, allow engineers to tailor the outputs to the desired results of their task. For instance, an agent can be set to operate in a "planning mode" focused on outlining development steps and gathering requirements, a "coding mode" optimized for generating and testing code, or a "documentation mode" emphasizing clarity and completeness of written artifacts. You might start with the off-the-shelf planning prompt, but you'll quickly want your own tailored version. Regardless of which modes are included out of the box, the ability to customize and extend them is critical. Agents must adapt to your unique workflows and prioritize what's important to your project. Without this flexibility, even well-designed defaults can fall short in real-world use. Engineers have preferences, and projects contain existing work. The best agents offer ways to communicate these preferences and decisions effectively. For example, 'pnpm' instead of 'npm' for package management, requiring the agent to seek root causes rather than offer temporary workarounds, or mandating that tests and linting must pass before a task is marked complete. Rules are a layer of control to accomplish this. Rules reinforce technical standards but also shape agent behavior to reflect project priorities and cultural norms. They inform the agent across contexts, think constraints, preferences, or directives that apply regardless of the task. Rules can encode things like style guidelines, risk tolerances, or communication boundaries. By shaping how the agent reasons and responds, rules ensure consistent alignment with desired outcomes. Roo code is an agent that makes great use of custom modes, and rules are ubiquitous across coding agents. These features form a meta-agent framework that allows engineers to construct the most effective agent for their unique project and workflow details. 2. Usage-based Pricing The best agents provide as much relevant information as possible to the model. They give transparency and control over what information is sent. This allows engineers to leverage their knowledge of the project to improve results. Being liberal with relevant information to the models is more expensive however, it also significantly improves results. The pricing model of some agents prioritizes fixed, predictable costs that include model fees. This creates an incentive to minimize the amount of information sent to the model in order to control costs. To get the most out of these tools, you’ve got to get the most out of models, which typically implies usage-based pricing. 3. Autonomous Workflows The way we accomplish work has phases. For example, creating tests and then making them pass, creating diagrams or plans, or reviewing work before submitting PRs. The best agents have mechanisms to facilitate these phases in an autonomous way. For the best results, each phase should have full use of a context window without watering down the main session's context. This should leverage your custom modes, which excel at each phase of your workflow. 4. Working in the Background The best agents are more effective at producing desired results and thus are able to be more autonomous. As agents become more autonomous, the ability to work in the background or work on multiple tasks at once becomes increasingly necessary to unlock their full potential. Agents that leverage local or cloud containers to perform work independently of IDEs or working copies on an engineer's machine further increase their utility. This allows engineers to focus on drafting plans and reviewing proposed changes, ultimately to work toward managing multiple tasks at once, overseeing their agent-powered workflows as if guiding a team. 5. Integrations with your Tools The Model Context Protocol (MCP) serves as a standardized interface, allowing agents to interact with your tools and data sources. The best agents seamlessly integrate with the platforms that engineers rely on, such as Confluence for documentation, Jira for tasks, and GitHub for source control and pull requests. These integrations ensure the agent can participate meaningfully across the full software development lifecycle. 6. Support for Multiple Model Providers Reliance on a single AI provider can be limiting. Top-tier agents support multiple providers, allowing teams to choose the best models for specific tasks. This flexibility enhances performance, the ability to use the latest and greatest, and also safeguards against potential downtimes or vendor-specific issues. Final Thoughts Selecting the right autonomous coding agent is a strategic decision. By prioritizing the features mentioned, technology leaders can adopt agents that can be tuned for their team's success. Tuning agents to projects and teams takes time, as does configuring the plumbing to integrate well with other systems. However, unlocking massive productivity gains is worth the squeeze. Models will become better and better, and the best agents capitalize on these improvements with little to no added effort. Set your organization and teams up to tap into the power of AI-enhanced engineering, and be more effective and more competitive....

Jun 3, 2025

4 mins

Advanced Authentication and Onboarding Workflows with Docusign Extension Apps

Advanced Authentication and Onboarding Workflows with Docusign Extension Apps Docusign Extension Apps are a relatively new feature on the Docusign platform. They act as little apps or plugins that allow building custom steps in Docusign agreement workflows, extending them with custom functionality. Docusign agreement workflows have many built-in steps that you can utilize. With Extension Apps, you can create additional custom steps, enabling you to execute custom logic at any point in the agreement process, from collecting participant information to signing documents. An Extension App is a small service, often running in the cloud, described by the Extension App manifest. The manifest file provides information about the app, including the app's author and support pages, as well as descriptions of extension points used by the app or places where the app can be integrated within an agreement workflow. Most often, these extension points need to interact with an external system to read or write data, which cannot be done anonymously, as all data going through Extension Apps is usually sensitive. Docusign allows authenticating to external systems using the OAuth 2 protocol, and the specifics about the OAuth 2 configuration are also placed in the manifest file. Currently, only OAuth 2 is supported as the authentication scheme for Extension Apps. OAuth 2 is a robust and secure protocol, but not all systems support it. Some systems use alternative authentication schemes, such as the PKCE variant of OAuth 2, or employ different authentication methods (e.g., using secret API keys). In such cases, we need to use a slightly different approach to integrate these systems with Docusign. In this blog post, we'll show you how to do that securely. We will not go too deep into the implementation details of Extension Apps, and we assume a basic familiarity with how they work. Instead, we'll focus on the OAuth 2 part of Extension Apps and how we can extend it. Extending the OAuth 2 Flow in Extension Apps For this blog post, we'll integrate with an imaginary task management system called TaskVibe, which offers a REST API to which we authenticate using a secret API key. We aim to develop an extension app that enables Docusign agreement workflows to communicate with TaskVibe, allowing tasks to be read, created, and updated. TaskVibe does not support OAuth 2. We need to ensure that, once the TaskVibe Extension App is connected, the user is prompted to enter their secret API key. We then need to store this API key securely so it can be used for interacting with the TaskVibe API. Of course, the API key can always be stored in the database of the Extension App. Sill, then, the Extension App has a significant responsibility for storing the API key securely. Docusign already has the capability to store secure tokens on its side and can utilize that instead. After all, most Extension Apps are meant to be stateless proxies to external systems. Updating the Manifest To extend OAuth 2, we will need to hook into the OAuth 2 flow by injecting our backend's endpoints into the authorization URL and token URL parts of the manifest. In any other external system that supports OAuth 2, we would be using their OAuth 2 endpoints. In our case, however, we must use our backend endpoints so we can emulate OAuth 2 to Docusign. ` The complete flow will look as follows: In the diagram, we have four actors: the end-user on behalf of whom we are authenticating to TaskVibe, DocuSign, the Extension App, and TaskVibe. We are only in control of the Extension App, and within the Extension App, we need to adhere to the OAuth 2 protocol as expected by Docusign. 1. In the first step, Docusign will invoke the /authorize endpoint of the Extension App and provide the state, client_id, and redirect_uri parameters. Of these three parameters, state and redirect_uri are essential. 2. In the /authorize endpoint, the app needs to store state and redirect_uri, as they will be used in the next step. It then needs to display a user-facing form where the user is expected to enter their TaskVibe API key. 3. Once the user submits the form, we take the API key and encode it in a JWT token, as we will send it over the wire back to Docusign in the form of the code query parameter. This is the "custom" part of our implementation. In a typical OAuth 2 flow, the code is generated by the OAuth 2 server, and the client can then use it to request the access token. In our case, we'll utilize the code to pass the API key to Docusign so it can send it back to us in the next step. Since we are still in control of the user session, we redirect the user to the redirect URI provided by Docusign, along with the code and the state as query parameters. 4. The redirect URI on Docusign will display a temporary page to the user, and in the background, attempt to retrieve the access token from our backend by providing the code and state to the /api/token endpoint. 5. The /api/token endpoint takes the code parameter and decodes it to extract the TaskVibe API secret key. It can then verify if the API key is even valid by making a dummy call to TaskVibe using the API key. If the key is valid, we encode it in a new JWT token and return it as the access token to Docusign. 6. Docusign stores the access token securely on its side and uses it when invoking any of the remaining extension points on the Extension App. By following the above step, we ensure that the API key is stored in an encoded format on Docusign, and the Extension App effectively extends the OAuth 2 flow. The app is still stateless and does not have the responsibility of storing any secure information locally. It acts as a pure proxy between Docusign and TaskVibe, as it's meant to be. Writing the Backend Most Extension Apps are backend-only, but ours needs to have a frontend component for collecting the secret API key. A good fit for such an app is Next.js, which allows us to easily set up both the frontend and the backend. We'll start by implementing the form for entering the secret API key. This form takes the state, client ID, and redirect URI from the enclosing page, which takes these parameters from the URL. The form is relatively simple, with only an input field for the API key. However, it can also be used for any additional onboarding questions. If you will ever need to store additional information on Docusign that you want to use implicitly in your Extension App workflow steps, this is a good place to store it alongside the API secret key on Docusign. ` Submitting the form invokes a server action on Next.js, which takes the entered API key, the state, and the redirect URI. It then creates a JWT token using Jose that contains the API key and redirects the user to the redirect URI, sending the JWT token in the code query parameter, along with the state query parameter. This JWT token can be short-lived, as it's only meant to be a temporary holder of the API key while the authentication flow is running. This is the server action: ` After the user is redirected to Docusign, Docusign will then invoke the /api/token endpoint to obtain the access token. This endpoint will also be invoked occasionally after the authentication flow, before any extension endpoint is invoked, to get the latest access token using a refresh token. Therefore, the endpoint needs to cover two scenarios. In the first scenario, during the authentication phase, Docusign will send the code and state to the /api/token endpoint. In this scenario, the endpoint must retrieve the value of the code parameter (storing the JWT value), parse the JWT, and extract the API key. Optionally, it can verify the API key's validity by invoking an endpoint on TaskVibe using that key. Then, it should return an access token and a refresh token back to Docusign. Since we are not using refresh tokens in our case, we can create a new JWT token containing the API key and return it as both the access token and the refresh token to Docusign. In the second scenario, Docusign will send the most recently obtained refresh token to get a new access token. Again, because we are not using refresh tokens, we can return both the retrieved access token and the refresh token to Docusign. The api/token endpoint is implemented as a Next.js route handler: ` In all the remaining endpoints defined in the manifest file, Docusign will provide the access token as the bearer token. It's up to each endpoint to then read this value, parse the JWT, and extract the secret API key. Conclusion In conclusion, your Extension App does not need to be limited by the fact that the external system you are integrating with does not have OAuth 2 support or requires additional onboarding. We can safely build upon the existing OAuth 2 protocol and add custom functionality on top of it. This is also a drawback of the approach - it involves custom development, which requires additional work on our part to ensure all cases are covered. Fortunately, the scope of the Extension App does not extend significantly. All remaining endpoints are implemented in the same manner as any other OAuth 2 system, and the app remains a stateless proxy between Docusign and the external system as all necessary information, such as the secret API key and other onboarding details, is stored as an encoded token on the Docusign side. We hope this blog post was helpful. Keep an eye out for more Docusign content soon, and if you need help building an Extension App of your own, feel free to reach out. The complete source code for this project is available on StackBlitz....

Jul 4, 2025

8 mins

Docusign

Introduction to Vercel’s Flags SDK

Introduction to Vercel’s Flags SDK In this blog, we will dig into Vercel’s Flags SDK. We'll explore how it works, highlight its key capabilities, and discuss best practices to get the most out of it. You'll also understand why you might prefer this tool over other feature flag solutions out there. And, despite its strong integration with Next.js, this SDK isn't limited to just one framework—it's fully compatible with React and SvelteKit. We'll use Next.js for examples, but feel free to follow along with the framework of your choice. Why should I use it? You might wonder, "Why should I care about yet another feature flag library?" Unlike some other solutions, Vercel's Flags SDK offers unique, practical features. It offers simplicity, flexibility, and smart patterns to help you manage feature flags quickly and efficiently. It’s simple Let's start with a basic example: ` This might look simple — and it is! — but it showcases some important features. Notice how easily we can define and call our flag without repeatedly passing context or configuration. Many other SDKs require passing the flag's name and context every single time you check a flag, like this: ` This can become tedious and error-prone, as you might accidentally use different contexts throughout your app. With the Flags SDK, you define everything once upfront, keeping things consistent across your entire application. By "context", I mean the data needed to evaluate the flag, like user details or environment settings. We'll get into more detail shortly. It’s flexible Vercel’s Flags SDK is also flexible. You can integrate it with other popular feature flag providers like LaunchDarkly or Statsig using built-in adapters. And if the provider you want to use isn’t supported yet, you can easily create your own custom adapter. While we'll use Next.js for demonstration, remember that the SDK works just as well with React or SvelteKit. Latency solutions Feature flags require definitions and context evaluations to determine their values — imagine checking conditions like, "Is the user ID equal to 12?" Typically, these evaluations involve fetching necessary information from a server, which can introduce latency. These evaluations happen through two primary functions: identify and decide. The identify function gathers the context needed for evaluation, and this context is then passed as an argument named entities to the decide function. Let's revisit our earlier example to see this clearly: ` You could add a custom evaluation context when reading a feature flag, but it’s not the best practice, and it’s not usually recommended. Using Edge Config When loading our flags, normally, these definitions and evaluation contexts get bootstrapped by making a network request and then opening a web socket listening to changes on the server. The problem is that if you do this in Serverless Functions with a short lifespan, you would need to bootstrap the definitions not just once but multiple times, which could cause latency issues. To handle latency efficiently, especially in short-lived Serverless Functions, you can use Edge Config. Edge Config stores flag definitions at the Edge, allowing super-fast retrieval via Edge Middleware or Serverless Functions, significantly reducing latency. Cookies For more complex contexts requiring network requests, avoid doing these requests directly in Edge Middleware or CDNs, as this can drastically increase latency. Edge Middleware and CDNs are fast because they avoid making network requests to the origin server. Depending on the end user’s location, accessing a distant origin can introduce significant latency. For example, a user in Tokyo might need to connect to a server in the US before the page can load. Instead, a good pattern that the Flags SDK offers us to avoid this is cookies. You could use cookies to store context data. The browser automatically sends cookies with each request in a standard format, providing consistent (no matter if you are in Edge Middleware, App Router or Page Router), low-latency access to evaluation context data: ` You can also encrypt or sign cookies for additional security from the client side. Dedupe Dedupe helps you cache function results to prevent redundant evaluations. If multiple flags rely on a common context method, like checking a user's region, Dedupe ensures the method executes only once per runtime, regardless of how many times it's invoked. Additionally, similar to cookies, the Flags SDK standardizes headers, allowing easy access to them. Let's illustrate this with the following example: ` Server-side patterns for static pages You can use feature flags on the client side, but that will lead to unnecessary loaders/skeletons or layout shifts, which are never that great. Of course, it brings benefits, like static rendering. To maintain static rendering benefits while using server-side flags, the SDK provides a method called precompute. Precompute Precompute lets you decide which page version to display based on feature flags and then we can cache that page to statically render it. You can precompute flag combinations in Middleware or Route Handlers: ` Next, inside a middleware (or route handler), we will precompute these flags and create static pages per each combination of them. ` The user will never notice this because, as we use “rewrite”, they will only see the original URL. Now, on our page, we “invoke” our flags, sending the code from the params: ` By sending our code, we are not really invoking the flag again but getting the value right away. Our middleware is deciding which variation of our pages to display to the user. Finally, after rendering our page, we can enable Incremental Static Regeneration (ISR). ISR allows us to cache the page and serve it statically for subsequent user requests: ` Using precompute is particularly beneficial when enabling ISR for pages that depend on flags whose values cannot be determined at build time. Headers, geo, etc., we can’t know their value at build, so we use precompute() so the Edge can evaluate it on the fly. In these cases, we rely on Middleware to dynamically determine the flag values, generate the HTML content once, and then cache it. At build time, we simply create an initial HTML shell. Generate Permutations If we prefer to generate static pages at build-time instead of runtime, we can use the generatePermutations function from the Flags SDK. This method enables us to pre-generate static pages with different combinations of flags at build time. It's especially useful when the flag values are known beforehand. For example, scenarios involving A/B testing and a marketing site with a single on/off banner flag are ideal use cases. ` ` Conclusion Vercel’s Flags SDK stands out as a powerful yet straightforward solution for managing feature flags efficiently. With its ease of use, remarkable flexibility, and effective patterns for reducing latency, this SDK streamlines the development process and enhances your app’s performance. Whether you're building a Next.js, React, or SvelteKit application, the Flags SDK provides intuitive tools that keep your application consistent, responsive, and maintainable. Give it a try, and see firsthand how it can simplify your feature management workflow!...

May 23, 2025

5 mins

Vercel

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Integrating AI Models Locally with Next.js ft. Jesus Padron

You might also like

Next.js + MongoDB Connection Storming

What Sets the Best Autonomous Coding Agents Apart?

Advanced Authentication and Onboarding Workflows with Docusign Extension Apps

Introduction to Vercel’s Flags SDK

Let's innovate together!

You might also like

Next.js + MongoDB Connection Storming

What Sets the Best Autonomous Coding Agents Apart?

Advanced Authentication and Onboarding Workflows with Docusign Extension Apps

Introduction to Vercel’s Flags SDK