Windows OS Deployment via Ansible AWX Server on ESX Enviroment

Windows OS Deployment via Ansible AWX Server on ESX Enviroment

Hi all, yes it has been a while since my last publish but believe me, in this days , I mean at home office working times. I am working harder and much busy than office working times.

This post I would like to share how to automatize my Windows OS installations on ESX environment via Ansible AWX system.

For this automation steps we need some knowledge about the tools and environment. I will not explain how to install the system or detailed explanation about the systems, I will give you some descriptions , you need follow the documents and learn the tools basics.

What is AWX;

You can find many documents at the internet about it but github project page simple explanation is ; “AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is the upstream project for Tower, a commercial derivative of AWX. ” You will find many detailed how to documents about it at “https://github.com/ansible/awx

What is Ansible ;

Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.  Source Ansible Documentation page

What is VMware ESX;

Most of IT people has deep knowledge what is this and how to manage it. Yes most popular Hardware Virtualization solution for corporate IT environment . You can find the latest updates on this page about it.

Lets Continue with the Windows OS Installation steps;

This is the general view of the AWX dashboard.

First things first , Lets start with ESX Vcenter access credentials, that user need to have full admin rights on ESX system.

Now for source code management we need to create a project on AWX. I am storing the yaml codes on our corporate github.

You need  also create credentials for github access to download yaml codes. Same as esxvcenter access cred.

Now we need a dummy inventory group for esx server access .  Dummy inventory is just an empty inventory group.

Now time to create a template for our windows OS deploy job. In this section we need to choose which inventory group will use for this,  which project will use for the yaml codes group and which yaml playbook code file should use for it.

Let me share my playbook yaml file with you for give you some idea about the Virtual Windows OS deployment.

I am sharing the code as downloadable file because yaml file indents very important so web site copy paste could harm the indents.

VM_Deploy_Cetin_20012020.yml

 

---

- name: Create VM Instance
  hosts: localhost
  connection: local
  gather_facts: false


  tasks:

    - name: Check if all variables have been defined
      fail:
        msg: "{{ item }} is not defined"
      when: "{{ item }} is not defined"
      with_items:
        - datacenter
        - cluster
        - folder
        - vmname
        - datastore
        - vlan_name
        - template
        - ip
        - netmask
        - gateway
        - dns1
        - dns2
        - template
        - vm_password
        - fqdn_domain
        - domain_join_account
        - domain_join_password


    - name: Create a VM from a template
      vmware_guest:
        hostname: '{{ lookup("env", "VMWARE_HOST") }}'
        username: '{{ lookup("env", "VMWARE_USER") }}'
        password: '{{ lookup("env", "VMWARE_PASSWORD") }}'
        datacenter: "{{ datacenter }}"
        cluster: "{{ cluster }}"
        folder: "{{ folder }}"
        validate_certs: no
        name: "{{ vmname }}"
        template: "{{ template }}"
#        wait_for_ip_address: no
        datastore: "{{ datastore }}"
#        - name: Add NIC to VM
#          ovirt_nic:
#          state: present
#          vm: "{{ vmname }}"
#          name: "{{ vlan_name }}"
#          interface: vmxnet3
#         mac_address: 00:1a:4a:16:01:56
#          profile: ovirtmgmt
#          network: ovirtmgmt
        state: poweredon
        networks:
        - name: "{{ vlan_name }}"
          device_type: vmxnet3
          start_connected: yes
          ip: '{{ ip }}'
          netmask: "{{ netmask }}"
          gateway: "{{ gateway }}"
          dns_servers:
           - "{{dns1}}"
           - "{{dns2}}"
          wait_for_ip_address: yes

        customization:
          autologon: yes
          hostname: "{{ vmname }}"
          password: "{{ vm_password }}"
          domainadmin: "{{ domain_join_account }}"
          domainadminpassword: "{{ domain_join_password }}"
          joindomain: "{{ fqdn_domain }}"
          runonce:
          - C:\Windows\System32\cmd.exe /c "C:\Ansible_Workaround\domain_group.cmd"

      register: deploy_vm
      ignore_errors: yes

    - name: Result of Virtual machine
      debug:
        var: deploy_vm

when you check the code you will see defined couple of variables in this code like ; datacenter,cluster,folder,ip,template etc..

We need the answer this variables at the AWX system. For this purpose we need to reedit the template and define  survey for it. Every step of this survey need to answer in the code.  for example “fqdn_domain” check the screenshot.

Now we need a Virtual Machine Template for the deployment usage. I have created many templates for this purpose for every Windows OS versions.

As you know ESX enviroment can generalize the cloned template to machine. We are triggering  this option automatically while cloning the machine.

The important point is in that template you need to install vmtools. Because awx tells the operation steps to  the esx, esx customizing the  Windows OS via vmtools.

On my environment I am not a domain admin group member. I am a member of specific OU admin group , that’s why I have put a small run ones script to template machine “C:\Ansible_Workaround\domain_group.cmd”

@echo off
net localgroup administrators domain\OU_Admins /ADD
TZUTIL /s "Turkey Standard Time"

Lets demonstrate a deployment;

First go to template and check it ones more; If everything seems OK, press the rocket icon and start the deployment. Answer the questions about the VM name, Ip,Gateway,LocalAdmin Passwords etc.

After it click the deploy. It start to deploy and depend about your environment speed , it will take time about ten minutes.

If you success you will see a screen like that;

I hope , This document  will help you have an idea about the ansible AWX and Windows deployment process.

Conclusion ;

Ansible is a big sea in IT world. If you learn how to sail in it , you will find many automation variations for your daily job. For example; I am using it take Cisco Switch backups in every week more than hundred device. May be it will be another  story on this blog.

I would like to special thanks to my colleague Tolga Asik for his cooperation with his VM knowladge  and also Mustafa  Sarı with his storage knowledge.

 

New Project with Devops Chain Tools

In April 2019, we initiated building a DevOps environment in our corporate while starting a new project.
I will give a short brief for every main actors in the DevOps chain and finally show the continuous delivery flow of our project.

What is DevOps?

  • Is a set of software development practices
  • Combines software development and information technology operations to shorten the systems development life cycle
  • Delivering features, fixes, and updates frequently in close alignment with business objectives

Current DevOps Chain in our project:

A DevOps toolchain is a set or combination of tools that aid in the delivery, development, and management of software applications throughout the systems development life cycle, as coordinated by an organisation that uses DevOps practices. Below picture shows the actors in our project DevOps chain:

What is Jira?

Jira is a issue tracking product developed by Atlassian that allows bug tracking and agile project management.

  • Plan – Create user stories and issues, plan sprints, and distribute tasks across your software team
  • Track – Prioritize and discuss your team’s work in full context with complete visibility
  • Release – Ship with confidence and sanity knowing the information you have is always up-to-date.
  • Report – Improve team performance based on real-time, visual data that your team can put to use.

What is GitHub?

GitHub is a code hosting platform for collaboration and version control. GitHub lets you (and others) work together on projects.

What is Jenkins?

Jenkins is an open source automation server which helps to automate the non-human part of the software development process

  • building
  • testing
  • delivering or deploying

What is SonarQube?

SonarQube is the leading product for Continuous Code Quality which detects bugs, code smells, and security vulnerabilities on 20+ programming languages. With a Quality Gate in place(Jenkins), you can fix the leak and therefore improve code quality systematically.

Project Continuous Delivery Flow

Continuous delivery automates the entire software release process. Every revision that is committed triggers an automated flow that builds, tests, and then stages the update. The final decision to deploy to a live production environment is triggered by the developer. Here is the flow prepared for Project:

We are planing to add the selenium test automation tool to our system and we will integrate it with Jenkins. I will share it soon.

In many thanks to for the contributions about it to Serhat Karataş.

Application Owners Self Service Solution

Infraself For The Application Owners

Hi ,

This time, I would like to tell you a small story about my application owners demands, request and a self service solution.

In my corporate company internal web site application owners stores their applications on-prem servers which I am in responsible infrastructure systems. Test, Development and Prod systems infrastructure are different but all of them on-prem. When the developers made some changes on the application configurations sometimes they can have a new code or updates about the application. They always came to me for the application service restart, application pool restart or even server restarts.

This is a annoying process for me and also for them. They are the responsible application but I have to take care of the application running status. Sometimes they can upload the wrong code or bugy codes. They need to update their codes and redeploy it to infrastructure (10) ten times in a day. That means I have to restart the services eleven (11) times in a day. No it was not  suitable  for  my working style !

In corporate rules application owners can’t have admin rights on any systems. So I need to find a solution for them and for my self.

I spoke to my manager about this process, he told me OK if you could find a suitable solution about it , he will gave me full support but with this conditions which are;

  1. With this solution they shouldn’t have  admin rights on any server,
  2. They shouldn’t have to have make a remote connection to any server,
  3. They could able to restart their own applications services or servers,
  4. On that systems every actions which they made ( restart , stop, start etc..) should be logged,
  5. They should not have get any admin rights user accounts information on that systems from any script or application.

So it seems a challenge, and yes Challenge Accepted !

What I got in my hands ;

  • All application working systems are Windows Servers versions,
  • All application owners willing to take care of the application full life cycle,
  •  Power shell scripts will be fine for all run windows operations.
  • Our MS licensing are giving me a freedom for all MS Products.

 How I solved my and application owners problem;

I have build a RDWeb Cluster systems for it. Two servers are accepting the connections with cluster build and they forwarding the RDP sessions to backend application sharing servers.  So with this operation users don’t know the RDP Servers. I have shared the Poweshell scripts via  RDWeb.

When users logon to RDWeb , if any powershell script shared with the user. Users can see the script. When click on the script , script is start to run on backend application server, so users can not see the script meta. I have sold the rest of the problems in side of the Powershell whic it was the easy part.

Here is the service restart script;

# Service Restart Script.
CLS
Function pause ($message)
{
# Check if running Powershell ISE
if ($psISE)
{
Add-Type -AssemblyName System.Windows.Forms
[System.Windows.Forms.MessageBox]::Show("$message")
}
else
{
Write-Host "$message" -ForegroundColor Darkgreen
$x = $host.ui.RawUI.ReadKey("NoEcho,IncludeKeyDown")
}
}

$Currentuser = [System.Security.Principal.WindowsIdentity]::GetCurrent().Name
$Username = 'Admin or service User Info'
$Password = 'Admin or service User Password'
$pass = ConvertTo-SecureString -AsPlainText $Password -Force
$SecureString = $pass
# Users you password securly
$MySecureCreds = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $Username,$SecureString 
filter timestamp {"$(Get-Date -Format G): $_"}
$owner = 'App server_Service_Restart Script'
$ServerName = 'Application Server Hostname'
$LogPath = '\Shared\Log Path\to every\application sharing \_Restart.log'
$To = "application owner email"
$From = "from email"
$Subject = "Service Status Info"
$Cc = "monitoring team email grup"
$SmtpServer = "SMTP Server hostname"
$JobTime = Get-Date

Add-Type -AssemblyName System.Windows.Forms
Add-Type -AssemblyName System.Drawing

$form = New-Object System.Windows.Forms.Form
$form.Text = 'Select the Service !!!'
$form.Size = New-Object System.Drawing.Size(300,200)
$form.StartPosition = 'CenterScreen'

$OKButton = New-Object System.Windows.Forms.Button
$OKButton.Location = New-Object System.Drawing.Point(75,120)
$OKButton.Size = New-Object System.Drawing.Size(75,23)
$OKButton.Text = 'OK'
$OKButton.DialogResult = [System.Windows.Forms.DialogResult]::OK
$form.AcceptButton = $OKButton
$form.Controls.Add($OKButton)

$CancelButton = New-Object System.Windows.Forms.Button
$CancelButton.Location = New-Object System.Drawing.Point(150,120)
$CancelButton.Size = New-Object System.Drawing.Size(75,23)
$CancelButton.Text = 'Cancel'
$CancelButton.DialogResult = [System.Windows.Forms.DialogResult]::Cancel
$form.CancelButton = $CancelButton
$form.Controls.Add($CancelButton)

$label = New-Object System.Windows.Forms.Label
$label.Location = New-Object System.Drawing.Point(10,20)
$label.Size = New-Object System.Drawing.Size(280,20)
$label.Text = 'Select Your Service:'
$form.Controls.Add($label)

$listBox = New-Object System.Windows.Forms.ListBox
$listBox.Location = New-Object System.Drawing.Point(10,40)
$listBox.Size = New-Object System.Drawing.Size(260,20)
$listBox.Height = 80

[void] $listBox.Items.Add('Service Name 1')
[void] $listBox.Items.Add('Service Name 2')
[void] $listBox.Items.Add('Service Name 2')
[void] $listBox.Items.Add('Service Name 2')
[void] $listBox.Items.Add('Service Name 2')

$form.Controls.Add($listBox)

$form.Topmost = $true

$result = $form.ShowDialog()

While ($true) {

if ($result -eq [System.Windows.Forms.DialogResult]::OK)

{
$x = $listBox.SelectedItem

$ITMessage = "If you have a trouble on $x please contact with ITI."


Write-Host Which operation would you like to run on Service $x ?
Write-Host ----------------------------
Write-Host 1 - Start 
Write-Host 2 - Stop
Write-Host 3 - Restart
Write-Host 4 - Check Service Status
Write-Host 5 - Kill the Service Process Manually
Write-Host 0 - Exit
Write-Host ----------------------------
Write-Host Please enter only number of the command . 
}

$command = Read-Host -Prompt 'Please enter the command number '

If ($command -eq 1) {

$sonuc = Invoke-command -credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($x) (Start-Service -InputObject $x -PassThru | select Status,Name,PSComputerName) } -ArgumentList $x
Write-Host $sonuc -ForegroundColor Green
Send-MailMessage -To $to -From $from -Cc $Cc -Subject $subject -Body "$x Service is started by $Currentuser via $owner at $JobTime on $ServerName " -SmtpServer $SmtpServer
Write-Output "$x Service is started by $Currentuser via $owner $ServerName"| timestamp >> $LogPath
Write-Host $ITMessage
pause "Press any key to continue"
CLS
}

ElseIf ($command -eq 2) {

$sonuc = Invoke-command -credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($x) (Stop-service -inputObject $x -PassThru | select Status,Name,PSComputerName) } -ArgumentList $x
Write-Host $sonuc -ForegroundColor Green
Send-MailMessage -To $to -From $from -Cc $Cc -Subject $subject -Body "$x Service is stoped by $Currentuser via $owner at $JobTime on $ServerName " -SmtpServer $SmtpServer
Write-Output "$x Service is stoped by $Currentuser via $owner on $ServerName"| timestamp >> $LogPath
Write-Host $ITMessage
pause "Press any key to continue"
CLS
}

ElseIf ($command -eq 3) {

$sonuc = Invoke-command -credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($x) (Restart-Service -inputObject $x -PassThru | select Status,Name,PSComputerName) } -ArgumentList $x
Write-Host $sonuc -ForegroundColor Green
Send-MailMessage -To $to -From $from -Cc $Cc -Subject $subject -Body "$x Service is restarted by $Currentuser via $owner at $JobTime on $ServerName" -SmtpServer $SmtpServer
Write-Output "$x Service is restarted by $Currentuser via $owner on $ServerName"| timestamp >> $LogPath
Write-Host $ITMessage
pause "Press any key to continue"
CLS
}

ElseIf ($command -eq 4) {

$sonuc = Invoke-command -credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($x) (Get-Service -inputObject $x | select -Property Status,Name,PSComputerName)} -ArgumentList $x
Write-Host $sonuc -ForegroundColor Green
Send-MailMessage -To $to -From $from -Cc $Cc -Subject $subject -Body "$x Service is checked by $Currentuser via $owner at $JobTime on $ServerName" -SmtpServer $SmtpServer
Write-Output "$x Service is checked by $Currentuser via $owner on $ServerName"| timestamp >> $LogPath
Write-Host $ITMessage
pause "Press any key to continue"
CLS
}

ElseIf ($command -eq 5) {

$id = Get-WmiObject -computername $Servername -credential $MySecureCreds -Class Win32_Service -Filter "Name LIKE '$x'" | Select-Object -ExpandProperty ProcessId
$procname = Invoke-command -credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($id) Get-Process -id $id |Select-Object -ExpandProperty Processname} -ArgumentList $id 
Write-Host $x Service Process Name is $procname and the process is killing now.
Invoke-command -credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($procname)get-process $procname | Stop-Process -Force -PassThru} -ArgumentList $procname
Write-Host $procname is killed and $x service is need to start again. -ForegroundColor Green 
Send-MailMessage -To $to -From $from -Cc $Cc -Subject $subject -Body "$x Service process is killed by $Currentuser via $owner at $JobTime on $ServerName " -SmtpServer $SmtpServer
Write-Output "$x Service process is killed by $Currentuser via $owner on $ServerName"| timestamp >> $LogPath
Write-Host $ITMessage
pause "Press any key to continue"
CLS
}

ElseIf ($command -eq 0) {
CLS
Exit
}

Else {
Write-Host "Please enter only number of the command !!! " -ForegroundColor Red
}
}
#App pool Restart Script
CLS
Function pause ($message)
{
# Check if running Powershell ISE
if ($psISE)
{
Add-Type -AssemblyName System.Windows.Forms
[System.Windows.Forms.MessageBox]::Show("$message")
}
else
{
Write-Host "$message" -ForegroundColor Darkgreen
$x = $host.ui.RawUI.ReadKey("NoEcho,IncludeKeyDown")
}
}

$Currentuser = [System.Security.Principal.WindowsIdentity]::GetCurrent().Name
$Username = 'Admin or Service Account'
$Password = 'Password of Admin or Service Account'
$pass = ConvertTo-SecureString -AsPlainText $Password -Force
$SecureString = $pass
# Users you password securly
$MySecureCreds = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $Username,$SecureString 
filter timestamp {"$(Get-Date -Format G): $_"}
$owner = 'Log Name '
$ServerName = 'IIS Server Hostname'
$LogPath = 'Log full path \share\logs\apppool_restart.log'
$To = "apppool owner email"
$From = "ServiceRestart@domain.com"
$Subject = "Service Status Info"
$Cc = "monitoring team email"
$SmtpServer = "smtp server hostname or fqdn"
$JobTime = Get-Date

Add-Type -AssemblyName System.Windows.Forms
Add-Type -AssemblyName System.Drawing

$form = New-Object System.Windows.Forms.Form
$form.Text = 'Please select App !!!'
$form.Size = New-Object System.Drawing.Size(300,200)
$form.StartPosition = 'CenterScreen'

$OKButton = New-Object System.Windows.Forms.Button
$OKButton.Location = New-Object System.Drawing.Point(75,120)
$OKButton.Size = New-Object System.Drawing.Size(75,23)
$OKButton.Text = 'OK'
$OKButton.DialogResult = [System.Windows.Forms.DialogResult]::OK
$form.AcceptButton = $OKButton
$form.Controls.Add($OKButton)

$CancelButton = New-Object System.Windows.Forms.Button
$CancelButton.Location = New-Object System.Drawing.Point(150,120)
$CancelButton.Size = New-Object System.Drawing.Size(75,23)
$CancelButton.Text = 'Cancel'
$CancelButton.DialogResult = [System.Windows.Forms.DialogResult]::Cancel
$form.CancelButton = $CancelButton
$form.Controls.Add($CancelButton)

$label = New-Object System.Windows.Forms.Label
$label.Location = New-Object System.Drawing.Point(10,20)
$label.Size = New-Object System.Drawing.Size(280,20)
$label.Text = 'IIS AppPool Select:'
$form.Controls.Add($label)

$listBox = New-Object System.Windows.Forms.ListBox
$listBox.Location = New-Object System.Drawing.Point(10,40)
$listBox.Size = New-Object System.Drawing.Size(260,20)
$listBox.Height = 80

[void] $listBox.Items.Add('Web App Pool 1')
[void] $listBox.Items.Add('Web App Pool 2')
[void] $listBox.Items.Add('Web App Pool 3')
[void] $listBox.Items.Add('Web App Pool 4')
[void] $listBox.Items.Add('Web App Pool 5')
[void] $listBox.Items.Add('Web App Pool 6')


$form.Controls.Add($listBox)

$form.Topmost = $true

$result = $form.ShowDialog()

While ($true) {

if ($result -eq [System.Windows.Forms.DialogResult]::OK)

{
$x = $listBox.SelectedItem 

$ITMessage = "If you have a trouble on $x please contact with ITI."

Write-Host Which operation would you like to run on AppPool $x ?
Write-Host ----------------------------
Write-Host 1 - Start 
Write-Host 2 - Stop
Write-Host 3 - Restart
Write-Host 4 - Check Apppool status
Write-Host 0 - Exit
Write-Host ----------------------------
Write-Host Please enter only number of the command . 
}

$command = Read-Host -Prompt 'Please enter the command number '

If ($command -eq 1) {

$sonuc = Invoke-command –credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($x) (C:\Windows\System32\inetsrv\appcmd.exe start apppool "$x" ) } -ArgumentList $x
Write-Host $sonuc -ForegroundColor Green
end-MailMessage -To $to -From $from -Cc $Cc -Subject $subject -Body "$x App Pool is started by $Currentuser via $owner at $JobTime " -SmtpServer $SmtpServer
Write-Output "$x App Pool is started by $Currentuser via $owner"| timestamp >> $LogPath
Write-Host $ITMessage
pause "Press any key to continue"
CLS
}

ElseIf ($command -eq 2) {

$sonuc = Invoke-command –credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($x) (C:\Windows\System32\inetsrv\appcmd.exe stop apppool "$x" ) } -ArgumentList $x
Write-Host $sonuc -ForegroundColor Green
Send-MailMessage -To $to -From $from -Cc $Cc -Subject $subject -Body "$x App Pool is stoped by $Currentuser via $owner at $JobTime " -SmtpServer $SmtpServer
Write-Output "$x App Pool is stoped by $Currentuser via $owner"| timestamp >> $LogPath
Write-Host $ITMessage
pause "Press any key to continue"
CLS
}

ElseIf ($command -eq 3) {

$sonuc = Invoke-command –credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($x) (C:\Windows\System32\inetsrv\appcmd.exe recycle apppool "$x" ) } -ArgumentList $x
Write-Host $sonuc -ForegroundColor Green
Send-MailMessage -To $to -From $from -Cc $Cc -Subject $subject -Body "$x App Pool is restarted by $Currentuser via $owner at $JobTime " -SmtpServer $SmtpServer
Write-Output "$x App Pool is restarted by $Currentuser via $owner"| timestamp >> $LogPath
Write-Host $ITMessage
pause "Press any key to continue"
CLS
}

ElseIf ($command -eq 4) {

$sonuc = Invoke-command –credential $MySecureCreds -ComputerName $ServerName -ScriptBlock {param ($x) (C:\Windows\System32\inetsrv\appcmd.exe list apppool "$x" /text:state )} -ArgumentList $x
Write-Host $sonuc -ForegroundColor Green
Send-MailMessage -To $to -From $from -Cc $Cc -Subject $subject -Body "$x App Pool is checked by $Currentuser via $owner at $JobTime " -SmtpServer $SmtpServer
Write-Output "$x App Pool is checked by $Currentuser via $owner"| timestamp >> $LogPath
Write-Host $ITMessage
pause "Press any key to continue"
CLS
}

ElseIf ($command -eq 0) {
CLS
Exit
}

Else {
Write-Host "Please enter only number of the command !!! " -ForegroundColor Red
}
}

If you are familiar with powershell, I belive , I don’t need to explain the scripts steps.

With this powershell script they can able restart their own applications system services or web app pools . It is easy and simple solution for all of us.

In other hand that means less admin effort for me 🙂

Let me show you an example ;

 

Log Management Solution with Elasticsearch, Logstash, Kibana and Grafana

What I need it , How I did it;

Hello all ,

I would like to share a solution which I build for my application team’s software’s log management.

They came to me and told me, they were looking for a solution; Their applications creates custom logs and that logs store on Windows Machine drive. They need a tool for the monitor all different log files, when a specific error occurs (key word) they were requesting an email about it. Also they want to see the how many log creating in the system and need to see in graph visualization.

In Infrastructure team we have using many monitoring tools for that kind of purpose but, none of them can understand the unstructured log files, I mean application based log structure. Yes they can understand the windows events or Linux messages logs but this time it was different. This log files was unstructured for  our monitoring application tools.

So I have start to looking for a solution about it. After small search in the internet I have found the tool . It is Elasticsearch, Logstash and Kibana  known name with Elastic-Stack .

I have download the product installed on a test server. It was great , I was happy because I have stored the logs in elasticsearch, parsed with logstash, read and collected with filebeat. I could easily querying the logs with Kibana (Web interface) . Now it comes to create alarms for the error keywords . What  ?, How !  Elastic.co asking money for this. This options only available in non free versions of it.

Yes, I am working at a big corporate company but now on these days, Management is telling us find a free and opensource versions for software’s .  Even they have motto about it. 🙂

In other hand we had also lot’s of monitoring tools, for this purpose we could buy a license for the text monitoring addon. So I have to solve it Free and Open Source.

I couldn’t give up the elasticsearch, because it was so easy the configure and very easy to see the visualizations of the logs. I have start the digging the internet again.  Yes I have found an other solution for it;   Grafana

With this free and opensource tool, I got fancy graph and alerting system about my logs. Yes eureka…I have solved . Now I would like to show you step by step,  how to do that.

Installation Steps;

I have install a clean  Centos 7.6 on a test machine.  After it I have install the epel repo on that system.

  • sudo yum install epel-release
  • sudo yum update
  • sudo su -

Now I need to install the elastic repos for the elastic installations.

  • cd /etc/yum.repos.d/
  • vim elasticsearch.repo
[elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
  • vim kibana.repo
[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
  • vim logstash.repo
[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
  • vim grafana.repo
[grafana]
name=grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

Elastic products needs OpenJDK to workout, for this purpose I decided to use Amazon Corretto 8 for the Open JDK

  • wget https://d3pxv6yz143wms.cloudfront.net/8.222.10.1/java-1.8.0-amazon-corretto-devel-1.8.0_222.b10-1.x86_64.rpm
  • yum install java-1.8.0-amazon-corretto-devel-1.8.0_222.b10-1.x86_64.rpm

Now I can install the all other tools.

  • yum install elasticsearch kibana logstash filebeat grafana nginx cifs-utils -y
systemctl start elasticsearch.service

systemctl enable elasticsearch.service

systemctl status elasticsearch.service



systemctl enable kibana

systemctl start kibana

systemctl status kibana

All elastic products are listening localhost (127.0.0.1)

  • cd /etc/nginx/conf.d
  • vim serverhostname.conf
server {
listen 80;

server_name servername.serverdomain.local;

auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/htpasswd.users;

location / {
proxy_pass http://localhost:5601;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
  }
}

On my system I have used the nginx as revers proxy and basic password authentication for the basic web site security.

I need to edit /etc/nginx/htpasswd.users file for the encrypted user and password info.

I have created the file via this web site for my users. You can use your own choices.

  • cd /etc/nginx/
  • vim htpasswd.users
admin:$apr1$1bdToKFy$0KYSsCviSpvcCzN9w1km.0
  • systemctl enable nginx
    
    systemctl start nginx
    
    systemctl status nginx

My private test server also in my private network so, decided not to use local firewall and selinux policies.

  • systemctl stop firewalld
    
    systemctl disable firewalld
    
    systemctl status firewalld
  • vim /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
  • reboot

On my system logs file storing on a windows server local disk. I need a find a way to access on that therefor  I have decided to mount the smb share on my local system.

  • vim /root/share_smb_creds
username=log_user
password=SecurePassword
  • useradd -u 5000 log_user
    
  • groupadd -g 6000 logs
usermod -G logs -a log_user
usermod -G logs -a kibana
usermod -G logs -a elasticsearch
usermod -G logs -a logstash
vim /etc/fstab
\\s152a0000246\c$\App_Log_Files /mnt/logs cifs credentials=/root/share_smb_creds,uid=5000,gid=6000 0 0

 

reboot

The point about the all elasticsearch products configuration files are YAML formated  so please you need to be careful about the conf format and yml files formats.

  • cd /etc/logstash/conf.d
  • vim 02-beats-input.conf
input {
beats {
port => 5044
   }
}
  • vim 30-elasticsearch-output.conf
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
   }
}
  • systemctl enable logstash
  • systemctl start logstash
  • systemctl status logstash
  • vim /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
paths:
- /mnt/logs/*.txt


output.logstash:
hosts: ["localhost:5044"]
systemctl enable filebeat
systemctl start filebeat 
systemctl status filebeat -l

If every things success you can connect to kibana web interface and you can able to manage your elasticsearch system.

Kibana ports is listening localhost:5601 but what we have done; setup the nginx for the revers proxy. When connection comes to nginx , nginx will ask a username and password, if you pass it success, you connection will forward it to kibana.

Off course you need to research about the graphs and visualizations, these are only basic movements.

Now on we can run the Grafana.

systemctl enable grafana-server
systemctl start grafana-server
systemctl status grafana-server

You can connect to grafana with your browser http://servername:3000

When you connect to your grafana gui , you will see a welcome page. First you need to add a data source for Grafana usage.

I have installed the latest version of elastic.co product so, I have choice the version 7+ , as a index name you can use “filebeat*” Save and test the configuration, if you success now you can able to see the logs and metric information on grafana too. 🙂

At the logs tab in the explore section , if you get an error like “Unkown elastic error response” That means elasticsearch sends big data to grafana , and grafana couldn’t understand it. You need to give small your time line to see logs in grafana. If you investigate the logs detailed , you have to use KIBANA.

Now it is time to search logs for errors and make an alert for your application team.

My application team gave me the error key words for the logs, so I know what should I search 🙂

Lest take an example ; My error log key word is “WRNDBSLOGDEF0000000002” so when I found that keyword in the last 15 mins logs, I need to send an email to application team.

First things first; Lets search it in the logs with Kibana;

As you can see in my example , error comes to and kibana in search results.

You need to define the alerts contacts information at  grafana notification channel.

Now we need to create an alert on grafana about it.  Please check my screen shots about it. You can see the details and step by step how to do that.

Grafana alert system settings is OK. But we have last settings as grafana system configuration , which is how to send the email via SMTP server.

vim /etc/garafana/grafana.ini
[smtp]
enabled = true
host = smtpserver.domain.com
;user =
# If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;"""
;password =
;cert_file =
;key_file =
skip_verify = true
from_address = grafana@domain.com
from_name = Grafana Alert
# EHLO identity in SMTP dialog (defaults to instance_name)
;ehlo_identity = domain.com

[emails]
welcome_email_on_sign_up = true

That’s it !!!

I am using this system about a month . My application teams are happy about it. Still I am improving it.

I will share the new updates at my future posts.

I hope it is also useful for you too.

If you have any further questions about and any suggestions. Please write comments in this post.