Websocket Server From Scratch in Go

In this blog post, we will be building a very minimal websocket server from scratch over TCP using the Go programming language.

Overview

Before starting to code, let’s understand how websocket protocol works. The websocket protocol is a full duplex communication protocol that operates over TCP. It is stateful, meaning that it remembers the past interactions with a client.

To create a websocket server, we first create a HTTP server that wil help us complete the websocket handshake and upgrade the protocol from HTTP to WebSocket. Then we can parse individual websocket frames and communicate. You can refer to RFC 6455 - WebSocket Protocol for very detailed information.

Create a TCP server

Creating a simple up and running TCP server in Go is easy. We import the net package and create a net.Listener on a port number. Then we accept incoming connections and handle each one in a separate goroutine.

package main

import (
	"net"
	"fmt"
)

func main() {
	port := 3030
	addr := fmt.Sprintf(":%v", port)

	listener, err := net.Listen("tcp", addr)
	if err != nil {
		panic(err)
	}

	fmt.Printf("Listening on port %v...\n", port)
	for true {
		conn, err := listener.Accept()
		if err != nil {
			fmt.Printf("Error accepting conn: %v\n", err)
			continue
		}
		go handleConn(conn)
	}
}

func handleConn(conn net.Conn) {
	defer conn.Close()
	fmt.Println("Handling new connection...")
}

Handle HTTP connection

In the handleConn function, we can now start to handle HTTP requests, and send HTTP responses. For example, a simple HTTP response from a server could be

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 11
Connection: Close

Hi client

Note that the new lines are not just new lines. They are \r\n in both request and response. Now let’s edit our code to print incoming request data, parse it and just send back a simple hello response.

// ...

func handleConn(conn net.Conn) {
	defer conn.Close()
	fmt.Println("Handling new connection...")

	// Read data sent to us in a buffer
	buf := make([]byte, 2048)
	for true {
		n, err := conn.Read(buf)
		if err != nil {
			if err == io.EOF {
				fmt.Println("Got EOF, closing connection..")
				break
			}
			fmt.Println("Error reading to buf: %v\n", err)
			continue
		}

		data := string(buf[:n])
		fmt.Printf("<--\n%v<--\n", data)

		req, err := parseHTTPReq(data)
		if err != nil {
			fmt.Printf("Error parsing http req: %v\n", err)
			continue
		}

		// Do nothing with parsed data for now, just send a hello response back
		_ = req
		var sb strings.Builder
		sb.WriteString("HTTP/1.1 200 OK\r\n")
		sb.WriteString("Content-Type: text/html\r\n")
		sb.WriteString("Content-Length: 11\r\n")
		sb.WriteString("Connection: Close\r\n")
		sb.WriteString("\r\n")
		sb.WriteString("Hi client\r\n")
		resp := sb.String()
		conn.Write([]byte(resp))
	}
}

type HTTPReq struct {
	method  string
	route   string
	proto   string
	headers map[string]string
	body    string
}

func parseHTTPReq(data string) (*HTTPReq, error) {
	lines := strings.Split(data, "\r\n")

	// Parse the top line
	reqLineParts := strings.Split(lines[0], " ")
	if len(reqLineParts) != 3 {
		return nil, errors.New("Can't split request line to 3 parts")
	}

	// Parse headers
	headers := make(map[string]string)
	i := 1
	for len(lines[i]) != 0 {
		parts := strings.SplitN(lines[i], ": ", 2)
		headers[parts[0]] = parts[1]
		i++
	}

	// Parse body if exists
	var body string
	i++
	if i < len(lines) {
		body += lines[i]
		i++
	}

	return &HTTPReq{
		method:  reqLineParts[0],
		route:   reqLineParts[1],
		proto:   reqLineParts[2],
		headers: headers,
		body:    body,
	}, nil
}

Now we can run the server, open http://localhost:3030 in the browser and see the request sent by browser. On my machine I see something like

% go run server.go
Listening on port 3030...
Handling new connection...
<--
GET / HTTP/1.1
Host: localhost:3030
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:126.0) Gecko/20100101 Firefox/126.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br, zstd
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Priority: u=1

<--
Got EOF, closing connection..

Upgrade the connection

Let’s observe what request we get while trying to make a websocket connection. For this we can use a API testing application like Postman. When we try to make a websocket connection via Postman, we get the request:

<--
GET / HTTP/1.1
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: lXya4QiFGAnldUlOjd1ezA==
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Host: localhost:3030

<--

The client sends send a HTTP request containing headers "Connection: Upgrade" and "Upgrade: websocket" for requesting a websocket upgrade. Our server has to accept this request and complete the handshake.

We have to note the value of request header "Sec-WebSocket-Key". This key should be parsed and a matching key should be returned in the response header "Sec-WebSocket-Accept" by our server. The response should have a status line saying "101 Switching Protocols". This makes the handshake complete.

To generate the accept key, we follow rules specified in RFC 6455. We have to concatenate the request key with a magic string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11", then generate sha1sum of the combined string, then base64 encode the sum value.

const wsMagicString = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"

func createWsAcceptKey(requestKey string) string {
	sha1bytes := sha1.Sum([]byte(requestKey + wsMagicString))
	return base64.StdEncoding.EncodeToString(sha1bytes[:])
}

Now the handleConn can complete the handshake and establish a websocket connection.

// ...

func handleConn(conn net.Conn) {
	defer conn.Close()
	fmt.Println("Handling new connection...")

	// Read data sent to us in a buffer
	buf := make([]byte, 2048)
	for true {
		n, err := conn.Read(buf)
		if err != nil {
			if err == io.EOF {
				fmt.Println("Got EOF, closing connection..")
				break
			}
			fmt.Println("Error reading to buf: %v\n", err)
			continue
		}

		data := string(buf[:n])
		fmt.Printf("<--\n%v<--\n", data)

		req, err := parseHTTPReq(data)
		if err != nil {
			fmt.Printf("Error parsing http req: %v\n", err)
			continue
		}

		// Check for websocket upgrade headers in request
		if strings.Contains(req.headers["Connection"], "Upgrade") &&
			req.headers["Upgrade"] == "websocket" {
			// Complete the websocket handshake with appropriate response and accept key
			var sb strings.Builder
			acceptKey := createWsAcceptKey(req.headers["Sec-WebSocket-Key"])
			sb.WriteString("HTTP/1.1 101 Switching Protocols\r\n")
			sb.WriteString("Upgrade: websocket\r\n")
			sb.WriteString("Connection: Upgrade\r\n")
			sb.WriteString(fmt.Sprintf("Sec-WebSocket-Accept: %v\r\n", acceptKey))
			sb.WriteString("\r\n")
			resp := sb.String()
			conn.Write([]byte(resp))

		} else {
			// Else just send a hello response
			var sb strings.Builder
			sb.WriteString("HTTP/1.1 200 OK\r\n")
			sb.WriteString("Content-Type: text/html\r\n")
			sb.WriteString("Content-Length: 11\r\n")
			sb.WriteString("Connection: Close\r\n")
			sb.WriteString("\r\n")
			sb.WriteString("Hi client\r\n")
			resp := sb.String()
			conn.Write([]byte(resp))
		}
	}
}

// ...

Congratulations! We have successfully established a websocket connection.

Handle WebSocket connection

Once we have established the connection, we must be ready for incoming messages. A websocket message can be contained in multiple frames, each frame containing a collection of bytes representing the metadata and payload. For handling the connection we should be first able to parse each incoming data frame.

Parse frames

A websocket frame is just a collection of bytes in a specific format. From RFC 6455, we can see the format for each frame.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

To parse these frames we will make a struct full of bytes, holding necessary information.

// ...

const (
	OpcodeContinuation = 0x0
	OpcodeText         = 0x1
	OpcodeBinary       = 0x2
	// 0x3 - x07 reserved for further non-control frames
	OpcodeClose = 0x8
	OpcodePing  = 0x9
	OpcodePong  = 0xa
	// 0xb - 0xf reserved for further control frames
)

// Stores bytes that are parsed from the websocket frame
type WsByteFrame struct {
	Final, Rsv1, Rsv2, Rsv3 byte
	Opcode                  byte
	Masked                  byte
	PayloadInitialLen       byte
	PayloadExtendedLen      []byte
	MaskingKey              []byte
	Payload                 []byte
}

// ...

And a function to actually read from net.Conn and give back the parsed WsByteFrame

// ...

func parseWSFrame(conn net.Conn) (WsByteFrame, error) {
	var bf WsByteFrame

	// Parse first byte
	first := make([]byte, 1)
	_, err := io.ReadFull(conn, first)
	if err != nil {
		return bf, err
	}
	bf.Final = first[0] & 0b10000000
	bf.Rsv1 = first[0] & 0b01000000
	bf.Rsv2 = first[0] & 0b00100000
	bf.Rsv3 = first[0] & 0b00010000
	bf.Opcode = first[0] & 0b00001111

	// Second byte
	second := make([]byte, 1)
	_, err = io.ReadFull(conn, second)
	if err != nil {
		return bf, err
	}
	bf.Masked = second[0] & 0b10000000
	bf.PayloadInitialLen = second[0] & 0b01111111

	payloadInitialLen := uint64(bf.PayloadInitialLen)

	extendedSize := 0
	if payloadInitialLen == 126 {
		extendedSize = 2
	} else if payloadInitialLen == 127 {
		extendedSize = 8
	}

	bf.PayloadExtendedLen = make([]byte, extendedSize)
	_, err = io.ReadFull(conn, bf.PayloadExtendedLen)
	if err != nil {
		return bf, err
	}

	var payloadLen uint64
	switch extendedSize {
	case 0:
		payloadLen = payloadInitialLen
	case 2:
		payloadLen = (uint64(bf.PayloadExtendedLen[0]) << 8) | uint64(bf.PayloadExtendedLen[1])
	case 8:
		payloadLen = uint64(bf.PayloadExtendedLen[0])<<56 |
			uint64(bf.PayloadExtendedLen[1])<<48 |
			uint64(bf.PayloadExtendedLen[2])<<40 |
			uint64(bf.PayloadExtendedLen[3])<<32 |
			uint64(bf.PayloadExtendedLen[4])<<24 |
			uint64(bf.PayloadExtendedLen[5])<<16 |
			uint64(bf.PayloadExtendedLen[6])<<8 |
			uint64(bf.PayloadExtendedLen[7])
	}

	var maskSize = 0
	isMasked := bf.Masked == 0b10000000
	if isMasked {
		maskSize = 4
	}
	bf.MaskingKey = make([]byte, maskSize)
	_, err = io.ReadFull(conn, bf.MaskingKey)
	if err != nil {
		return bf, err
	}

	bf.Payload = make([]byte, payloadLen)
	_, err = io.ReadFull(conn, bf.Payload)
	if err != nil {
		return bf, err
	}

	return bf, nil
}

// ...

We are making slices of specific lengths and using io.ReadFull to read a specific number of bytes from the frame data at a time. Then we store the bytes into our frame struct.

Make a handler function

Now that we can parse frame data, we can start handling our websocket connection. Let’s make a function handleWSConn parses the incoming data in a loop. We can pass the existing connection from handleConn over to handleWSConn. If the websocket handler exits, it is likely that client disconnects, so we also return from handleConn function.

@@ -68,6 +68,8 @@ func handleConn(conn net.Conn) {
 			sb.WriteString("\r\n")
 			resp := sb.String()
 			conn.Write([]byte(resp))
+			handleWSConn(conn)
+			return
 		} else {
 			// Else just send a hello response
 			var sb strings.Builder
@@ -83,6 +85,24 @@ func handleConn(conn net.Conn) {
 	}
 }

+func handleWSConn(conn net.Conn) {
+	fmt.Println("Handling ws connection...")
+	for true {
+		frame, err := parseWSFrame(conn)
+		if err != nil {
+			if err == io.EOF {
+				fmt.Printf("Received EOF from %v, closing ws connection\n")
+				break
+			}
+			fmt.Println("Error parsing bytes:", err)
+			continue
+		}
+
+		// Do nothing with parsed frame for now
+		_ = frame
+	}
+}
+
 const wsMagicString = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"

 func createWsAcceptKey(requestKey string) string {

Decode payload and store in a buffer

Each websocket frame will contain a encoded payload, which is the actual message that is sent from other side. The frame also contains a masking key, which is used to mask the payload data. We use the value of masking key with XOR operation on payload itself to decode it. We then have to store the decoded data into a buffer because multiple frames can be used to send a single message.

You may think: “How do we know the message has ended?”. Well, each frame contains a FIN bit at very beginning to denote if its the last frame for a message. Check the frame diagram above. We can check if that bit is set to know if the frame is final.

After decoding the final frame, we can handle the websocket message using a separate message handler function. Then we can clear out the buffer and start storing next message.

@@ -1,6 +1,7 @@
 package main

 import (
+	"bytes"
 	"crypto/sha1"
 	"encoding/base64"
 	"errors"
@@ -87,6 +88,8 @@ func handleConn(conn net.Conn) {

 func handleWSConn(conn net.Conn) {
 	fmt.Println("Handling ws connection...")
+	var buf bytes.Buffer
+
 	for true {
 		frame, err := parseWSFrame(conn)
 		if err != nil {
@@ -98,11 +101,27 @@ func handleWSConn(conn net.Conn) {
 			continue
 		}

-		// Do nothing with parsed frame for now
-		_ = frame
+		// Decode the payload and store in buffer
+		decodedPayload := frame.Payload
+		for i := 0; i < len(decodedPayload); i++ {
+			decodedPayload[i] = decodedPayload[i] ^ frame.MaskingKey[i%4]
+		}
+		buf.Write(decodedPayload)
+
+		isFinalFrame := frame.Final == 0b10000000
+		if isFinalFrame {
+			msg := string(buf.Bytes())
+			handleWSMessage(msg)
+			buf.Reset()
+		}
 	}
 }

+func handleWSMessage(msg string) {
+	fmt.Printf("<--ws msg--\n%v--\n", string(msg))
+	// Handle it ...
+}
+
 const wsMagicString = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"

 func createWsAcceptKey(requestKey string) string {

Now if we run the server, open Postman, connect to localhost:3030 using websocket, and send a message “Hello server”, our server outputs the following

% go run server.go
Listening on port 3030...
Handling new connection...
<--
GET / HTTP/1.1
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: bf9iRhMKzFwHpYT+WJNN3A==
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Host: localhost:3030

<--
Handling ws connection...
<--ws msg--
Hello server--

Ping and Pong

The client occasionally will send ping frames to the server to ensure that connection is alive. To handle these pings, our server sends pong frames back. The pong frame is just a copy of ping frame with opcode set to OpcodePong.

To send frames we can use the a function that assembles the bytes from WsByteFrame struct into a linear sequence and sends it.

@@ -101,6 +101,14 @@ func handleWSConn(conn net.Conn) {
 			continue
 		}

+		// Send pong frame back when pinged, no need to even decode the payload
+		if frame.Opcode == OpcodePing {
+			pongFrame := frame
+			pongFrame.Opcode = OpcodePong
+			sendWSFrame(conn, pongFrame)
+			continue
+		}
+
 		// Decode the payload and store in buffer
 		decodedPayload := frame.Payload
 		for i := 0; i < len(decodedPayload); i++ {
@@ -122,6 +130,17 @@ func handleWSMessage(msg string) {
 	// Handle it ...
 }

+func sendWSFrame(conn net.Conn, bf WsByteFrame) (int, error) {
+	first := bf.Final | bf.Rsv1 | bf.Rsv2 | bf.Rsv3 | bf.Opcode
+	second := bf.Masked | bf.PayloadInitialLen
+
+	bytes := []byte{first, second}
+	bytes = append(bytes, bf.PayloadExtendedLen...)
+	bytes = append(bytes, bf.MaskingKey...)
+	bytes = append(bytes, bf.Payload...)
+	return conn.Write(bytes)
+}
+
 const wsMagicString = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"

 func createWsAcceptKey(requestKey string) string {

Closing the connection

To close the connection, the client sends a frame with opcode set to OpcodeClose and server responds by sending the same frame back disabling the mask and masking key and with payload set to the same data but decoded. We can implement that by adding a simple if condition.

@@ -117,6 +117,16 @@ func handleWSConn(conn net.Conn) {
 		}
 		buf.Write(decodedPayload)

+		// Close the connection if frame with OpcodeClose received
+		if frame.Opcode == OpcodeClose {
+			closeFrame := frame
+			closeFrame.Masked = 0
+			closeFrame.MaskingKey = []byte{}
+			closeFrame.Payload = decodedPayload
+			sendWSFrame(conn, closeFrame)
+			break
+		}
+
 		isFinalFrame := frame.Final == 0b10000000
 		if isFinalFrame {
 			msg := string(buf.Bytes())

Conclusion

We successfully implemented a very basic websocket server over TCP in Go without using help external help. The handleWSMessage function is where we can handle the messages, which is left for the reader to implement. For detailed information on websocket protocol, check out WebSocket RFC Specification