Markov Job Posting Generator

Ever notice job postings look the same? They all tend to have an introduction, a list of qualifications and/or responsibilities and leave off with a conclusion. I am sure the format helps people quickly filter out jobs they are not qualified for but it makes looking through job postings a repetitive process. If you are in doubt, take a look at the latest jobs for engineering at: Indeed, Workopolis and LinkedIn.

Inspired by a humourous post on HackerNews where Tom Forbes created a funny site mimicking the look and feel of users’ submitted stories and comments I created a web app that does something similar with job postings. Using pseudo-random text you can get grammatically correct sentences that are nonsensical and hilarious.

A few of my favourite random qualifications are below:

  • Must be legally entitled to work independently
  • Proven track record of executing customers on-time, on-spec and on-budget
  • Flexibility in order to achieve compliance

The job generator is currently hosted on Heroku and can be found at: https://markovjobpostinggenerator.herokuapp.com/.

Example Generated Mechanical Engineer “Job Posting”

You will work with customers, Engineering and must have a good fit for purpose hoisting system. You are proficient with all suppliers. As our ideal candidate: you are conversant with Windows XP Pro and Server 2003 operating systems, competent with the team to achieve various technical skills that will benefit from the Product and Technical Services teams during execution activities. Come explore us – where innovation, creativity and passion of our processes.

Key Responsibilities

* BS degree in engineering or civil engineering
* Mechanical Designer in oil and gas industry;
* Can work independently with minimal supervision on projects from concept to commercialization
* Prepare and develop manufacturing work instructions, test procedures, conduct testing and commercialization;
* Negotiates with suppliers concerning different aspects of the automated assembling equipment and appliances insuring compliance with AAR, FRA standards and procedures
* Demonstrated organizational skills and interpersonal, relationship-building skills
* Progressive ground up design experience

We promote continuous learning. Communicate project expectations to the on-going effort to identify and exploit business efficiency improvement opportunities as well as aiding the needs of its investments and the world’s built and natural environments in six key practice areas: Buildings, Earth & Environment, Energy, Industrial, Infrastructure, and Sustainability. Our never ending goal is to ensure consistency of work.

That sounds pretty good for a random job posting. I am going to talk a bit about Markov Chains, talk about where the data comes from and final thoughts. I’ve listed the resources I’ve used to help anyone who wants to take a look at the code.

What is a Markov Chain?

A Markov Chain describes a sequences of possible events where the probability of each event depends only on the state of the previous event. Markov Chains were used to generate text where the next generated word was based on the last two generated words. As the Agiliq blog explains it:

The algorithm is,

  1. Have a text which will serve as the corpus/base.
  2. Start with two consecutive words from the text. The last two words constitute the present state.
  3. Generating next word is the markov transition. To generate the next word, look in the corpus and find which words are present after the given two words. Choose one of them randomly.
  4. Repeat Step 3 until text of required size is generated.

The larger the corpus or the database of words, the better the generated sentences will be. Agiliq used “My Man Jeeves” by P. G. Wodehouse for their text but I used job postings.

Scraping Job Posting Data from Indeed.com

Indeed has many postings per day and a fairly good API that can be used to scrape the data. The Indeed Publisher account used to to this can be signed up for and some tutorials are are provided. I was unable to find an API for the individual job so I had to use the excellent Python scraping library BeautifulSoup to scrape the text for individual jobs. Unfortunately this slow down the process and makes the app run much slower than ideal. The scraper then looks for bullet points and separates a job description into bullets and non-bullets. When the generator is being run there is a introduction and conclusion, which uses the non-bullet corpus while the qualification/requirements use the bullet corpus.

Final Thoughts

The humourous fictitious job postings are rather funny though sometimes the grammar can be a bit off. Not everyone uses the same standard and puts bullets throughout the text causing problems when scraping the data. Also the time required to scrape the job postings limits the amount to 10-20 listings which means not enough corpus is available to get good sentences. One solution around this is to scrape the data beforehand and place it into text files as a large database of available corpus. The scraping is included in the current version of the code though people search for some funny jobs. You can see the top searched job at: http://markovjobpostinggenerator.herokuapp.com/top.

Resources

Github

https://github.com/jschembri/markovJobPostingGenerator

Text Markov Generator:

http://agiliq.com/blog/2009/06/generating-pseudo-random-text-with-markov-chains-u/
http://www.yisongyue.com/shaney/
http://projects.haykranen.nl/markov/demo/
https://veekaybee.github.io/markov-in-python/

Heroku

https://devcenter.heroku.com/articles/getting-started-with-python#introduction
http://blog.y3xz.com/blog/2012/08/16/flask-and-postgresql-on-heroku

Flask

http://flask.pocoo.org/docs/0.10/tutorial/
http://blog.sahildiwan.com/posts/flask-and-postgresql-app-deployed-on-heroku/

Job Posting API

http://ca.indeed.com/publisher
http://www.indeed.com/jsp/apiinfo.jsp