AI, AI, AI, we keep hearing this term again and again. From factories to your home, artificial intelligence is everywhere. As AI evolved, the quest for teaching machines grew exponentially. This gave rise to a new field called “Machine Learning”. With the advancement in machine learning, we saw the emergence of new families of machine learning methods. One of them is deep learning.

Instead of relying on task-specific algorithms, deep learning depends on machine learning methods to learn data representations. Deep learning is being used as a tool to automate front end development. One of the biggest barriers in automating front end development is computing power. Automating front end development with deep learning will speed up prototyping and resolve many issues involved in traditional front-end development.

The efforts of automating front end development gained momentum after Tony Beltramelli published a paper introducing pix2code that let you generate code from a graphical user interface. To take things to the next level, Airbnb launched sketch2code, capable of generating code from low fidelity wireframes.

Combined with the power of deep learning algorithms and synthesized training data, you can still try your hand at artificial front-end development. Here is how you can build a neural network in three iterations.

  1. Give a design image to a trained neural network
  2. The neural network converts the image into HTML markup
  3. Rendered output

To begin with, you must create a bare-bones version to know how things work. In the second version, you should automate all steps and explain the neural network layers. Finally, the final version will explore LSTM layer.

Core Logic

Our goal is to create a neural network that will generate HTML/CSS markup that aligns with a screenshot. Just like AI, when you are training a neural network, you should give them several screenshots with matching HTML. Leave the rest to the neural network as it continues to learn by predicting all the matching HTML markup tags.

Most models predict word by word but there are other approaches as well, which you can experiment with. When the neural network gets the same screenshot for each prediction, it will replicate the same thing again and again. For instance, if it has to predict 20 words, it will get the same design mockup twenty times. Instead of worrying about how neural network works, it is important that you focus on the input and output of a neural network.

The neural network uses the data you input to create features. Its focus is on creating features that could connect input data with output data. Secondly, it uses the input to grow its knowledge, which it relies on when predicting next tags. Whether you use the trained model for real-world applications or train the model first, both will yield you same results as the text is generated one by one for the same screenshot every time. You don’t have to input the correct HTML tags, it predicts the next markup tag automatically.

Basic Version

Here is how a neural network creates a basic version. When you show a screenshot with a website displaying Hello World and teach it to generate the markup, it will map the design mockup into a list of pixel values. It breaks down the design mockup into three different channels, red, blue and green with values ranging from 0-255.

Represent the markups in such a way that neural networks understand them easily. For this, a professional website design agency uses hot encoding techniques. Don’t forget to include start and end tags to give the neural network an idea about where to start and where to end predictions.

When inputting data, start by entering the first word and then add next words one by one. The output will show one word. Repeat the same process for sentences. Sentences have a maximum limit so if your sentence falls short of the maximum limit, you must fill it with empty words. Empty word is a word with just zeros.

Remember words are printed from right to left so each word will have to change position for every training round. This helps the model to learn the sequence instead of memorizing the position of every word.

Running the Code

Now, it is time to run the code. If you are new to deep learning then, FloydHub is an ideal choice. It is basically a training platform for deep learners which let you manage your deep learning experiments. Installing FloydHub and running your first experiment takes minutes.  Start off by cloning the repository, login, and initiate from command line tool. Run a Jupyter notebook on a FloydHub cloud GPU machine.

HTML Version

In the HTML version, the focus is mostly on creating scalable implementation and the moving pieces of a neural network. Even though the HTML version can not predict HTML from random websites, but it is helpful when it comes to exploring the dynamics of the problem.

There are two major sections in the HTML version.

  • Encoder
  • Decoder

The encoder is where image features and previous markup features are built. The decoder combines design, markup feature and creates a next tag feature. This feature comes in handy when fully connected neural networks had to predict the next tag.

Here are some of the mistakes you should never make.

  • Never start working on the first version before collecting data
  • Not understanding the input and output data
  • Using LSTM when you are light on resources instead of CNN’s
  • Not using custom parsing for code
  • Not understanding the strengths and weaknesses of different models
  • Not using a pre-trained model even if they are relevant
  • Not customizing the model to run on a remote server
  • Poor understanding of library functions
  • Not trying different hyperparameters, models and CNN architecture
  • Using a resource-intensive model when experimenting

What are the biggest challenges you have ever faced when you are trying to convert design mockups into code with deep learning? Feel free to share it with us in the comments section below.


Please enter your comment!
Please enter your name here