Skip to content

Level up your REST API's with JSON Schema

Level up your REST API's with JSON Schema

This article was written over 18 months ago and may contain information that is out of date. Some content may be relevant but please refer to the relevant official documentation or available resources for the latest information.

JSON Schema isn’t a hot topic that gets a lot of attention compared to GraphQL or other similar tools. I discovered the power of JSON Schema while I was building a REST API with Fastify. What is it exactly? The website describes it as “the vocabulary that enables JSON data consistency, validity, and interoperability at scale”. Or more simply, it’s a schema specification for JSON data. This article is going to highlight some of the benefits gained by defining a JSON Schema for a REST API.

JSON Schema Basics

Here’s an example of a simple schema representing a user:

{
  "$id": "<https://example.com/schemas/user>",
  "$schema": "<http://json-schema.org/draft-07/schema#>",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string"
    },
    "lastName": {
      "type": "string"
    },
    "email": {
      "type": "string",
      "format": "email"
    },
    "age": {
      "type": "integer"
    },
    "newsletterSubscriber": {
      "type": "boolean"
    },
    "favoriteGenres": {
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  },
  "required": ["email"],
  "additionalProperties": false
}

If you’re familiar with JSON already, you can probably understand most of this at a glance. This schema represents a JSON object with some properties that define a User in our system, for example. Along with the object’s properties, we can define additional metadata about the object. We can describe which fields are required and whether or not the schema can accept any additional properties that aren’t defined on the properties list.

Types

We covered a lot of types in our example schema. The root type of our JSON schema is an object with various properties defined on it. The base types available to define in your JSON Schema map to valid JSON types: object, array, string, number, integer, boolean. Check the type reference page to learn more.

Formats

The email property in our example has an additional field named format next to its type. The format property allows us to define a semantic identification for string values. This allows our schema definition to be more specific about the type of values allowed for a given field. “hello” is not a valid string value for our email type.

Another common example is for date or timestamp values that get serialized. Validation implementations can use the format definition to make sure a value matches the expected type and format defined. There’s a section on the website that lists the various formats available for the string type.

Schema Structuring

JSON Schema supports referencing schemas from within a schema. This is a very important feature that helps us keep our schemas DRY. Looking back to our initial example we might want to define a schema for a list of users. We defined an id on our user schema of “user”, we can use this to reference that schema from another schema.

{
	"type": "array",
	"items": {
		"$ref": "<https://example.com/schemas/user>"
	}
}

In this example we have a simple schema that is just an array whose items definition references our user schema. This schema is exactly the same as if we defined our initial schema inside of "items": { }. The JSON Schema website has a page dedicated to structuring schemas.

JSON Schema Benefits

Validation

One of the main benefits of defining a schema for your API is being able to validate inputs, outputs. Inputs include things like the request body, URL parameters, and search parameters. The output is your response JSON data or headers. There are some different libraries available to handle schema validation. A popular choice and the one used by Fastify is called Ajv.

Security

Validating inputs has some security advantages. It can prevent bad or malicious data from being accepted by your API. For instance, you can specify that a certain field must be an integer, or that a string must match a certain regex pattern. This can help prevent SQL injection, cross-site scripting (XSS).

Defining a schema for your response types can help to prevent leaking sensitive data from your database. Your web server can be configured to not include any data that is not defined in the schema from your responses.

Performance

By validating data at the schema level, you can reject invalid requests early, before they reach more resource-intensive parts of your application. This can help protect against Denial of Service (DoS) attacks.

fast-json-stringify is a library that creates optimized stringify functions from JSON schemas that can help improve response times and throughput for JSON API’s.

Documentation

JSON Schema also greatly aids in API documentation. Tools like OpenAPI and Swagger use JSON Schema to automatically generate human-readable API documentation. This documentation provides developers with clear, precise information about your API’s endpoints, request parameters, and response formats. This not only helps to maintain consistent and clear communication within your development team, but also makes your API more accessible to outside developers.

Type-safety

I plan to cover this in more detail in an upcoming post but there are tools available that can help achieve type-safety both on your server and client-side by pairing JSON Schema with TypeScript. In Fastify for example, you can infer types in your request handlers based on your JSON Schema specifications.

Schema Examples

I’ve taken some example schemas from the Fastify website to walk through how they would work in practice.

### queryStringJsonSchema
const queryStringJsonSchema = {
  type: 'object',
   properties: {
     name: { type: 'string' },
     excitement: { type: 'integer' }
   }, 
  additionalProperites: "false"
}

We would use this schema to define, validate, and parse the query string of an incoming request in our API.

Given a query string like: ?name=Dane&excitement=10&other=additional - we can expect to receive an object that looks like this:

{
  name: "Dane",
  excitement: 10
}

Since additionalProperties are not allowed, the other property that wasn’t defined on our schema gets parsed out.

### paramsJsonSchema

Imagining we had a route in our API defined like /users/:userId/posts/:slug

const paramsJsonSchema = {
  type: 'object',
  properties: {
     userId: { type: 'number' },
     slug: { type: 'string' }
   },
 additionalProperties: "false",
 required: ["userId", "slug"]
}

Given this url: /users/1/posts/hello-world - we can expect to get an object in our handler that looks like this:

{
  userId: 1,
  slug: "hello-world"
}

We can be sure about this since the schema doesn’t allow for additional properties and both properties are required. If either field was missing or not matching its type, our API can automatically return a proper error response code.

Just to highlight what we are getting here again. We are able to provide fine-grained schema definitions for all the inputs and outputs of our API. Aside from serving as documentation and specification, it powers validation, parsing, and sanitizing values. I’ve found this to be a very simple and powerful tool in my toolbox.

Summary

In this post, we've explored the power and functionality of JSON Schema, a tool that often doesn't get the spotlight it deserves. We've seen how it provides a robust structure for JSON data, ensuring consistency, validity, and interoperability on a large scale. Through our user schema example, we've delved into key features like types, formats, and the ability to structure schemas using references, keeping our code DRY. We've also discussed the substantial benefits of using JSON Schema, such as validation, enhanced security, improved performance, and the potential for type-safety. We've touched on useful libraries like Ajv for validation and fast-json-stringify for performance optimization.

In a future post we will explore how we can utilize JSON Schema to achieve end-to-end type-safety in our applications.

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

You might also like

End-to-end type-safety with JSON Schema cover image

End-to-end type-safety with JSON Schema

End-to-end type-safety with JSON Schema I recently wrote an introduction to JSON Schema post. If you’re unfamiliar with it, check out the post, but TLDR: It’s a schema specification that can be used to define the input and output data for your JSON API. In my post, I highlight many fantastic benefits you can reap from defining schemas for your JSON API. One of the more interesting things you can achieve with your schemas is end-to-end type safety from your backend API to your client application(s). In this post, we will explore how this can be accomplished slightly deeper. Overview The basic idea of what we want to achieve is: * a JSON API server that validates input and output data using JSON Schema * The JSON Schema definitions that our API uses transformed into TypeScript types With those pieces in place, we can achieve type safety on our API server and the consuming client application. The server side is pretty straightforward if you’re using a server like Fastify with already enabled JSON Schema support. This post will focus on the concepts more than the actual implementation details though. Here’s a simple diagram illustrating the high-level concept: We can share the schema and type declaration between the client and server. In that case, we can make a request to an endpoint where we know its type and schema, and assuming the server validates the data against the schema before sending it back to the client, our client can be confident about the type of the response data. Marrying JSON Schema and TypeScript There are a couple of different ways to accomplish this: * Generating types from schema definitions using code generation tools * Creating TypeBox definitions that can infer TypeScript types and be compiled to JSON Schema I recommend considering both and figuring out which would better fit your application and workflows. Like anything else, each has its own set of trade-offs. In my experience, I’ve found TypeBox to be the most compelling if you want to go deep with this pattern. Code generation A couple of different packages are available for generating TS types from JSON Schema definitions. * https://github.com/bcherny/json-schema-to-typescript * https://github.com/vega/ts-json-schema-generator They are CLI tools that you can provide a glob path to where your schema files are located and will generate TS declaration files to a specified output path. You can set up an npm hook or a similar type of script that will generate types for your development environment. TypeBox TypeBox is a JSON Schema type builder. With this approach, instead of json files, we define schemas in code using the TypeBox API. The TypeBox definitions infer to TypeScript types directly, which eliminates the code generation step described above. Here’s a simple example from the documentation of a JSON Schema definition declared with TypeBox: ` This can then be inferred as a TypeScript type: ` Aside from schemas and types, TypeBox can do a lot more to help us on our type-safety journey. We will explore it a bit more in upcoming sections. Sharing schemas between client and server applications Sharing our JSON Schema between our server and client app is the main requirement for end-to-end type-safety. There are a couple of different ways to accomplish this, but the simplest would be to set up our codebase as a monorepo that contains both the server and client app. Some popular options for TypeScript monorepos are: PNPM, Turborepo, and NX. If a monorepo is not an option, you can publish your schema and types as a package that can be installed in both projects. However, this setup would require a lot more maintenance work. Ultimately, as long as you can import your schemas and types from the client and server app, you are in good shape. Server-to-client validation and type-safety For the sake of simplicity, let's focus on data flowing from the server to the client for now. Generally speaking, the concepts also apply in reverse, as long as your JSON API server validates your inputs and outputs. We’ll look at the most basic version of having strongly typed data on the client from a request to our server. Type-safe client requests In our server application, if we validate the /users endpoint with a shared schema - on the client side, when we make the request to the endpoint, we know that the response data is validated using the user schema. As long as we are confident of this fact, we can use the generated type from that schema as the return type on our client fetch call. Here’s some pseudocode: ` Our server endpoint would look something like this: ` You could get creative and build out a map that defines all of your endpoints, their metadata, and schemas, and use the map to define your server endpoints and create an API client. Transforming data over the wire Everything looks stellar, but we can still take our efforts a bit further. To this point, we are still limited to serialized JSON data. If we have a created_at field (number or ISO string) tied to our user, and we want it to be a Date object when we get a hold of it on the client side - additional work and consideration are required. There are some different strategies out there for deserializing JSON data. The great thing about having shared schemas between our client and server is that we can encode our type information in the schema without sending additional metadata from our server to the client. Using format to declare type data In my initial JSON Schema blog post, I touched on the format field of the specification. In our schema, if the actual type of our date is a string in ISO8601 format, we can declare our format to be "date-time". We can use this information on the client to transform the field into a proper Date object. ` Transforming serialized JSON Data This can be a little bit tricky; again, there are many ways to accomplish it. To demonstrate the concept, we’ll use TypeBox to define our schemas as discussed above. TypeBox provides a Transform type that you can use to declare, encode, and decode methods for your schema definition. ` It even provides helpers to statically generate the decoded and encoded types for your schema ` If you declare your decode and encode functions for your schemas, you can then use the TypeBox API to handle decoding the serialized values returned from your JSON API. Here’s what the concept looks like in practice fetching a user from our API: ` Nice. You could use a validation library like Zod to achieve a similar result but here we aren’t actually doing any validation on our client side. That happened on the server. We just know the types based on the schema since both ends share them. On the client, we are just transforming our serialized JSON into what we want it to be in our client application. Summary There are a lot of pieces in play to accomplish end-to-end type safety. With the help of JSON Schema and TypeBox though, it feels like light work for a semi-roll-your-own type of solution. Another great thing about it is that it’s flexible and based on pretty core concepts like a JSON API paired with a TypeScript-based client application. The number of benefits that you can reap from defining JSON Schemas for your APIs is really great. If you’re like me and wanna keep it simple by avoiding GraphQL or other similar tools, this is a great approach....

D1 SQLite: Schema, migrations and seeds cover image

D1 SQLite: Schema, migrations and seeds

I’ve written posts about some of the popular ORM’s in TypeScript and covered their pros and cons. Prisma is probably the most well known and Drizzle is a really popular up and comer. I like and use ORM’s in most of my projects but there’s also a camp of folks who believe they shouldn’t be used. I started a small Cloudflare Workers project recently and decided to try using their D1 SQLite database without adding any ORM. This is the first post in a 2 part series where we’ll explore what this experience is like using only the driver and utilities made available in the Wrangler CLI. Introduction If you’re unfamiliar with Cloudflare D1 - it’s a distributed SQLite database for the Cloudflare Workers platform. Workers are lightweight serverless functions/compute distributed across a global network. The platform includes services and API’s like D1 that provide extended capabilities to your Workers. At the time of this writing, there are only two ways to interact with a D1 database that I’m aware of. * From a Worker using the D1 Client API * D1 HTTP API In this 2 part series, we will create a simple Workers project and use the D1 Client API to build out our queries. Getting Started For this tutorial, we’ll create a simple Cloudflare Worker project and treat it like a simple node script/http server to do our experiments. The first step is initializing a new Cloudflare Worker and D1 database: ` ` We need to take our binding and add it to our project wrangler.toml configuration. Once our binding is added we can re-generate the types for our Worker project. ` We now have our DB binding added to our project Env types. Let’s add a simple query to our worker script to make sure our database is setup and working: ` Start the development server with npm run dev and access the server at http://localhost:8787 . When the page loads we should see a successful result {"1 + 1":2} . We now have a working SQLite database available. Creating a schema Since we’re not using an ORM with some kind of DSL to define a schema for our database, we’re going to do things the old fashioned way. “Schema” will just refer to the data model that we create for our database. We’ll define it using SQL and the D1 migrations utility. Let’s create a migration to define our initial schema: ` For our demo purposes we will build out a simple blog application database. It includes posts, authors, and tags to include some relational data. We need to write the SQL in our migration to create all the tables and columns that we need: ` The SQL above defines our tables, columns, and their relations with foreign keys. It also includes a join table for posts and tags to represent a many-to-many relationship. If you don’t have a lot of experience writing SQL queries it might look a little bit intimidating at first. Once you take some time to learn it it’s actually pretty nice. DataLemur is a pretty great resource for learning SQL. If you need help with a specific query, Copilot and GPT are quite good at generating SQL queries. Just make sure you take some time to try to understand them and check for any potential issues. After completing a migration script it needs to be applied: ` I added the --local flag so that we’re working against a local database for now. Typing our Schema One of the downsides of our ORMless approach is we don’t get TypeScript types out of the box. In a smaller project, I think the easiest approach is just to manage your own types. It’s not hard, you can even have GPT help if you want. If managing type definitions for your schema is not acceptable for your project or use case you can look for a code generation tool or switch to an ORM / toolset that provides types. For this example I created some basic types to map to our schema so that we can get help from the lsp when working with our queries. ` Seeding development data Outside of our migrations, we can write SQL scripts and execute them against our D1 SQLite database using wrangler. To start we can create a simple seeds/dev.sql script to load some initial development seed data into our local database. Another example might be a reset.sql that drops all of our tables so we can easily reset our database during development as we rework the schema or run other experiments. Since our database is using auto incrementing integer ID’s, we can know up front what the ID’s for the rows we are creating are since our database is initially empty. This can be a bit tricky if you’re using randomly generated ID’s. In that case you would probably want to write a script that can collect ID’s of newly created records and use them for creating related records. Here we are just passing the integer ID directly in our SQL script. As an example, we know up front that the author Alice Smith will have the id 1, Bob Johnson 2, and so on. Post_tags looks a little bit crazy since it’s just a join table. Each row is just a post_id and a tag_id. (1, 1), (1 2), etc. Here’s the code for a dev seed script: ` Here’s the code for a reset script - it’s important to remember to drop the migrations table in your reset so you can apply your migrations. ` Using the wrangler CLI we can execute our script files against our local development and remote d1 database instances. Since we have already applied our migrations to our local database, we can use our dev.sql seed script to load some data into our db. ` The Wrangler output is pretty helpful - it lets us know to add the --remote flag to run against our remote instance. We can also use execute to run commands against our database. Lets run a select to look at the data added to our posts table. ` This command should output a table showing the columns of our db and the 7 rows we added from the dev seed script. Summary Using wrangler and the Cloudflare D1 platform we’ve already gotten pretty far without an ORM or any other additional tooling. We have a simple but effective migrations system in place and some initial scripts for easily seeding and resetting our databases. There are also some other really great things built-in to the D1 platform like time travel and backups. I definitely recommend at least skimming through the documentation at some point. In the next post we will start interacting with our database and sample data using the D1 Client API....

The simplicity of deploying an MCP server on Vercel cover image

The simplicity of deploying an MCP server on Vercel

The current Model Context Protocol (MCP) spec is shifting developers toward lightweight, stateless servers that serve as tool providers for LLM agents. These MCP servers communicate over HTTP, with OAuth handled clientside. Vercel’s infrastructure makes it easy to iterate quickly and ship agentic AI tools without overhead. Example of Lightweight MCP Server Design At This Dot Labs, we built an MCP server that leverages the DocuSign Navigator API. The tools, like `get_agreements`, make a request to the DocuSign API to fetch data and then respond in an LLM-friendly way. ` Before the MCP can request anything, it needs to guide the client on how to kick off OAuth. This involves providing some MCP spec metadata API endpoints that include necessary information about where to obtain authorization tokens and what resources it can access. By understanding these details, the client can seamlessly initiate the OAuth process, ensuring secure and efficient data access. The Oauth flow begins when the user's LLM client makes a request without a valid auth token. In this case they’ll get a 401 response from our server with a WWW-Authenticate header, and then the client will leverage the metadata we exposed to discover the authorization server. Next, the OAuth flow kicks off directly with Docusign as directed by the metadata. Once the client has the token, it passes it in the Authorization header for tool requests to the API. ` This minimal set of API routes enables me to fetch Docusign Navigator data using natural language in my agent chat interface. Deployment Options I deployed this MCP server two different ways: as a Fastify backend and then by Vercel functions. Seeing how simple my Fastify MCP server was, and not really having a plan for deployment yet, I was eager to rewrite it for Vercel. The case for Vercel: * My own familiarity with Next.js API deployment * Fit for architecture * The extremely simple deployment process * Deploy previews (the eternal Vercel customer conversion feature, IMO) Previews of unfamiliar territory Did you know that the MCP spec doesn’t “just work” for use as ChatGPT tooling? Neither did I, and I had to experiment to prove out requirements that I was unfamiliar with. Part of moving fast for me was just deploying Vercel previews right out of the CLI so I could test my API as a Connector in ChatGPT. This was a great workflow for me, and invaluable for the team in code review. Stuff I’m Not Worried About Vercel’s mcp-handler package made setup effortless by abstracting away some of the complexity of implementing the MCP server. It gives you a drop-in way to define tools, setup https-streaming, and handle Oauth. By building on Vercel’s ecosystem, I can focus entirely on shipping my product without worrying about deployment, scaling, or server management. Everything just works. ` A Brief Case for MCP on Next.js Building an API without Next.js on Vercel is straightforward. Though, I’d be happy deploying this as a Next.js app, with the frontend features serving as the documentation, or the tools being a part of your website's agentic capabilities. Overall, this lowers the barrier to building any MCP you want for yourself, and I think that’s cool. Conclusion I'll avoid quoting Vercel documentation in this post. AI tooling is a critical component of this natural language UI, and we just want to ship. I declare Vercel is excellent for stateless MCP servers served over http....

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Prefer email? hi@thisdot.co