Vincent Prouillet

Hey, it made my day !

Windows 8 Preview

Windows 8 was presented at the BUILD conference and a preview was available for download.
You can find the iso (with or without tools) here : .
I couldn’t get it to work in VMWare but it works well in VirtualBox.

There are several new things :
- Integration with Windows Live (you can put your Live mail and it will connect)

- Metro interface (my WLM picture appeared a bit later)

- Ribbons in explorer (you can go to the usual interface using the Desktop button or Windows Explorer button on the previous screen)

- A neat task manager (you can go back to the metro interface with the windows button in the bottom left)

- IE 10

Get capacity metrics for SQL Server and Oracle

Second round of the capacity metrics

SQL Server

Size of database :

EXEC sp_spaceused

Size of all the tables > 10mo :

--Removes the procedure if it already exists
IF OBJECT_ID('dbo.GetSizeOfAllTables') IS NOT NULL
DROP PROCEDURE dbo.GetSizeOfAllTables
GO
--Gets the number of rows and size in MB of every table in the current database
CREATE PROCEDURE GetSizeOfAllTables
AS
DECLARE @TableName VARCHAR(100)

--Cursor storing all the user tables names
DECLARE tableCursor CURSOR
FOR
SELECT [name]
FROM dbo.sysobjects
WHERE  OBJECTPROPERTY(id, N'IsUserTable') = 1
FOR READ ONLY

--We create a temp table to hold the result of sp_spaceused
CREATE TABLE #TempTable
(
    tableName varchar(100),
    numberOfRows varchar(100),
    reservedSize varchar(50),
    dataSize varchar(50),
    indexSize varchar(50),
    unusedSize varchar(50)
)

OPEN tableCursor
--Get the first table name from the cursor
FETCH NEXT FROM tableCursor INTO @TableName
WHILE (@@Fetch_Status >= 0)
BEGIN
    --Dump the results of the sp_spaceused query to the temp table
    INSERT  #TempTable
        EXEC sp_spaceused @TableName

    FETCH NEXT FROM tableCursor INTO @TableName
END
CLOSE tableCursor
DEALLOCATE tableCursor

--Select all records and compute the data we want
SELECT * FROM
    (SELECT tableName, RTRIM(numberOfRows) as 'rows', ROUND((CONVERT(FLOAT, SUBSTRING(dataSize, 1, LEN(dataSize)-3)) + CONVERT(FLOAT, SUBSTRING(indexSize, 1, LEN(indexSize)-3)))/1024, 3) as 'Size'
    FROM #TempTable) T
WHERE Size > 10

--Clean up behind us
DROP TABLE #TempTable
GO
EXEC GetSizeOfAllTables

Current users

--Create a table to hold the data from sp_who
CREATE TABLE #temp
(
    spid varchar(100),
    ecid varchar(100),
    statuts varchar(100),
    loginame varchar(100),
    hostname varchar(50),
    blk varchar(50),
    dbname varchar(50),
    cmd varchar(50),
    requestID varchar(50),
);

INSERT INTO #temp EXEC sp_who

--Get the data we want
SELECT COUNT(*) AS cnt, loginame FROM #temp WHERE dbname = 'OnlyForTest' GROUP BY loginame ORDER BY cnt DESC;

--Clean up
DROP TABLE #temp

Current queries

SELECT COUNT(*) AS cnt, text FROM sys.dm_exec_requests
CROSS APPLY sys.dm_exec_sql_text(sql_handle) WHERE status='running' GROUP BY text ORDER BY cnt DESC;

Oracle

Size of the instance

SELECT sys_context('USERENV','DB_NAME'), SUM(bytes)/1024/1024 "Mb" FROM dba_data_files

Size of the tablespaces

SELECT b.tablespace_name, tbs_size SizeMB, a.free_space FreeMB
FROM  (SELECT tablespace_name, round(SUM(bytes)/1024/1024 ,2) AS free_space
FROM dba_free_space
GROUP BY tablespace_name) a,
(SELECT tablespace_name, SUM(bytes)/1024/1024 AS tbs_size
FROM dba_data_files
GROUP BY tablespace_name) b
WHERE a.tablespace_name(+)=b.tablespace_name

Size of the schemas (only the one above 10MB)

SELECT owner, (SUM(bytes)/1024/1024)
FROM dba_segments WHERE owner NOT IN ('SYS')
GROUP BY owner HAVING (SUM(bytes)/1024/1024) > 10

Size of the tables (only the one above 10MB)

SELECT owner, table_name, TRUNC(sum(bytes)/1024/1024) Meg
FROM
    (SELECT segment_name table_name, owner, bytes
    FROM dba_segments
    WHERE segment_type = 'TABLE'
    UNION ALL
    SELECT i.table_name, i.owner, s.bytes
    FROM dba_indexes i, dba_segments s
    WHERE s.segment_name = i.index_name
    AND   s.owner = i.owner
    AND   s.owner = i.owner
    AND   s.segment_type = 'INDEX'
    UNION ALL
    SELECT l.table_name, l.owner, s.bytes
    FROM dba_lobs l, dba_segments s
    WHERE s.segment_name = l.segment_name
    AND   s.owner = l.owner
    AND   s.segment_type = 'LOBSEGMENT'
    UNION ALL
    SELECT l.table_name, l.owner, s.bytes
    FROM dba_lobs l, dba_segments s
    WHERE s.segment_name = l.index_name
    AND   s.owner = l.owner
    AND   s.segment_type = 'LOBINDEX')
WHERE owner NOT IN ('SYS')
GROUP BY table_name, owner
HAVING SUM(bytes)/1024/1024 > 10
ORDER BY SUM(bytes) desc

Current users and queries

SELECT COUNT(*) AS cnt, a.username, b.sql_text
FROM v$session a, v$sqlarea b
WHERE sql_address=b.address
GROUP BY a.username, b.sql_text

Install ibm_db on Ubuntu

Ok so this one was a bit tricky.
I needed to access remote DB2 databases using Python.
I found ibm_db (page here) and followed the instructions.

First you need to set up DB2 drivers : you can get it here (Data Server Driver Package, NOT the ODBC) or here here (I used this one). You will need to create an account for both.
You can then untar the archive, select the good folder (odbc_cli_driver/linuxia32) and untar the archive present there, for example in /home/vincent/db2driver .

If you try to build and install ibm_db right now it will tell you to set up the DB2 environment so let’s do it.

vincent@ubuntu:~$ export IBM_DB_LIB=/home/vincent/db2driver/lib
vincent@ubuntu:~$ export IBM_DB_DIR=/home/vincent/db2driver

You should now be install to build install ibm_db :

vincent@ubuntu:~$ cd /home/Downloads/ibm_db/
vincent@ubuntu:~/home/Downloads/ibm_db/$ python setup.py build
vincent@ubuntu:~/home/Downloads/ibm_db/$ python setup.py install

Try to import it in python and you should get the following error :

vincent@ubuntu:/home/vincent/Downloads/ibm_db-1.0.4# python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibm_db
Traceback (most recent call last):
  File "", line 1, in
ImportError: libdb2.so.1: cannot open shared object file: No such file or directory

Huuum still not working…after a little googling I found how to resolve it in this issue .

You have to do the following :

vincent@ubuntu:~/Downloads/ibm_db-1.0.4$ sudo nano /etc/ld.so.conf.d/db2.conf
[sudo] password for vincent:
vincent@ubuntu:~/Downloads/ibm_db-1.0.4$ sudo ldconfig
vincent@ubuntu:~/Downloads/ibm_db-1.0.4$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibm_db
Traceback (most recent call last):
  File "", line 1, in
ImportError: libstdc++.so.5: cannot open shared object file: No such file or directory

Install the libstdc++5 package and voila!

vincent@ubuntu:~/Downloads/ibm_db-1.0.4$ sudo apt-get install libstdc++5
....
incent@ubuntu:~/Downloads/ibm_db-1.0.4$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibm_db
>>>

It should work now !

Get capacity metrics for MySQL and PostgreSQL

Getting capacity metrics for these two dabatases is suprisingly easily. We will get the size of database(s), current users connected and current queries made (and also the size for each tablespace for PostgreSQL).

MySQL

Size of database :

SELECT LOCALTIMESTAMP, table_schema AS 'Name', ROUND( SUM( data_length + index_length ) /1024 /1024, 3 ) AS 'Size (MB)'
FROM information_schema.tables
WHERE table_schema = 'name_of_our_database'
GROUP BY table_schema

Size of all the tables :

SELECT LOCALTIMESTAMP, table_name, table_rows,
ROUND(data_length/1024/1024,2) + ROUND(index_length/1024/1024,2) FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema','mysql')

Current users and queries

SELECT LOCALTIMESTAMP, COUNT(*) AS cnt, user, host, info
FROM INFORMATION_SCHEMA.PROCESSLIST GROUP BY user, info ORDER BY cnt DESC;

If you have the slow query (doc here) activated, it’s really worth getting the data from it.

PostgreSQL

Size of all the databases from the instance (we don’t use pg_size_pretty because we want all our reports to use the same unit and we exclude the basic databases)

SELECT localtimestamp, pg_database.datname, (pg_database_size(pg_database.datname)::float/1024/1024)::numeric(8,3)
FROM pg_database WHERE datname NOT IN ('template0','template1','postgres')

Size of the tablespaces (same thing as before)

SELECT localtimestamp, spcname, (pg_tablespace_size(oid)::float/1024/1024)::numeric(8,3)
FROM pg_tablespace WHERE spcname != 'pg_global';

Size of the tables (only the one above 10MB)

SELECT * FROM (
    SELECT localtimestamp, nspname || '.' || relname AS "table", reltuples,
    (pg_total_relation_size(C.oid)::float/1024/1024)::numeric(8,3) AS "size"
    FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
    WHERE nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast') AND C.relkind <> 'i') T
WHERE size > 10

Current users and queries

SELECT localtimestamp, COUNT(*) AS cnt, usename, current_query
FROM pg_stat_activity GROUP BY usename,current_query ORDER BY cnt DESC;

Install Oracle on CentOS 6

I followed the quick installation guide Oracle provides so that’s more like an even quicker installation guide.

To start, we need to install the necessary packages :

[root@oracletest database]# yum install binutils compat-libstdc++* elfutils* gcc gcc-c++ glibc* compat-glibc* ksh libaio* libstdc++* make sysstat unixODBC unixODBC-devel

We now need to create the users and groups :

[root@oracletest database]# groupadd oinstall
[root@oracletest database]# groupadd dba
[root@oracletest database]# useradd -g oinstall -G dba oracle
[root@oracletest database]# passwd oracle

Tune the kernel a bit :

[root@oracletest database]# nano /etc/sysctl.conf
//Then add these lines at the end of the file
kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 6815744
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 4194304
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
//And validates it
[root@oracletest database]# /sbin/sysctl -p

Change the shell limits :

[root@oracletest database]# nano /etc/security/limits.conf
//Then add these lines at the end of the file
oracle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 65536

[root@oracletest database]# nano /etc/pam.d/login
//Then add this line at the end of the file
session required pam_limits.so

Modify the bash profile :

[root@oracletest ~]# nano /home/oracle/.bash_profile
//Then add these lines at the end of the file
ulimit -u 16384 -n 65536
umask 022
ORACLE_BASE=/u01/app; export ORACLE_BASE
ORACLE_SID=orcl; export ORACLE_SID
ORACLE_HOME=$ORACLE_BASE/product/11.2.0/db1 #depends on your directory
PATH=$ORACLE_HOME/bin:$PATH; export PATH

Create the directories :

[root@oracletest database]# mkdir -p /u01/app/
[root@oracletest database]# chown -R oracle.oinstall /u01

And for the X (you might have to open a new console for that to work):

[root@oracletest vincent]# xhost +

You can now su – oracle, run the ./runInstaller script and follows what the installer says (a couple of scripts to run at the end).
It might tell that you don’t have the necessary packages, but it somehow doesn’t recognize higher versions or 64bits versions of these one so you can safely continue.
You can now access EM on https://hostname:1518/em .

Install CentOS 6 using the remote Net Install

Instead of downloading the 4.x go for the DVD iso, you could download the Net Install (about 216mo) and let it retrieves what it needs to install.

You can download the iso there http://www.centos.org/modules/tinycontent/index.php?id=30 in the 6 or 6.0 directory.

Once you load it on your server or VM, you will have to choose the media from which to install CentOS.
Choose URL and you now have 2 options : use one of your image in your network or use a CentOS mirror.

The CentOS mirrors URL are the following (choose the one according to your architecture) :

http://mirror.centos.org/centos/6/os/i386/

http://mirror.centos.org/centos/6/os/x86_64/

It will now retrieve the image and start the installation with the GUI.

Cleaning Postgres WAL archive directory

The disk containing the WAL (Write-Ahead Logging explained here : http://www.postgresql.org/docs/8.0/static/wal.html) at work was getting close to 90% since we didn’t have any kind of deleting obsolete WAL files.
A WAL directory is filled with files with names like 000000010000000000000034 but a few looks like 000000010000000000000065.007C9330.backup.
The ones with the backup extension indicate a new backup and the starting WAL file would be 000000010000000000000065 in this example (the second part of the filename can be ignored).
We can then safely delete the files older than this one.

I found after having spent some time on the script that pg_archivecleanup (http://developer.postgresql.org/pgdocs/postgres/pgarchivecleanup.html) existed, but our server is running PostgreSQL 8.3 and I had some library problems when compiling it as a standalone so I just finished my ksh script.
I delete all the files older than the most recent starting WAL file and log every deletion with the timestamp of file creation.

#!/bin/ksh

archive_directory=/home/vincent/postgres/archive #put your own directory obviously
backup_labels=$archive_directory/*.backup
files=$archive_directory/*
log_path=/home/vincent/logs/clean.log #a made-up directory, you get the idea

if [ ! -f $log_path ];then
	touch $log_path
	echo -e "Cleaning of WAL archives log\n\n" >> $log_path
fi

#Gets the last backup label created
latest_file_path=`ls -tr $backup_labels | tail -1`
#Gets the name of the start file
start_wal=$(echo `basename $latest_file_path` | cut -d'.' -f1)

current_time=`date +"%Y-%m-%d %H:%M"` 

echo -e "\nCleaning started at : ${current_time}" >> $log_path
echo -e "\tKeep WAL file ${start_wal} and later\n" >> $log_path

for file in $files
do
	if [ $file -ot $latest_file_path ]; then
		time=`ls -l $file | awk '{ print $6, $7 }'`
		rm $file
		echo "Removing file ${file} (created at ${time})" >> $log_path
	fi
done

echo -e "\n--------------------------------------------------" >> $log_path

Load various excel files into MySQL

I recently had to write a python script that was unzipping a an archive containing several excel files and then load the data of a specific sheet into a MySQL 5 database. Doing it with only SQL was not an option since it required some manual actions (saving the sheet as csv, which resulted in the loss the data from formulas including other sheets, corrected by copying/pasting the data beforehand and columns containing commas or other type of char).

I used xlrd as the API to read the excel files and the zipfile module to unzip.

My first thought was to parse each line and insert the whole file in a single really big INSERT. Needless to say it was really slow.
I then thought of converting the xls files to csv files and then use the LOAD INTO syntax from MySQL, which is quite fast (~3s for 30k lines).

I found a script doing close to what I wanted here so I took the core and tweaked it a bit : choose a sheet, a delimiter and remove unwanted characters.

For the example, let’s say we have 2 tables (table1 and table2) to load into (indentation are whitespace, not tab) :

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import xlrd
import csv
import MySQLdb
import os
import re
import zipfile

#Gets and formats the data from the sheet
def sheet_to_dictionary(excel_file, sheet_number):
   book = xlrd.open_workbook(excel_file)
   formatter = lambda(t,v): format_excel_value(book,t,v)
   raw_sheet = book.sheet_by_index(sheet_number)
   data = []

   for row in range(raw_sheet.nrows):
      (types, values) = (raw_sheet.row_types(row), raw_sheet.row_values(row))
      data.append(map(formatter, zip(types, values)))

   return ({ 'sheet_data': data })

#Format Excel date to isodate
def exceldate_to_isodate(datetuple):
   (y,m,d, hh,mm,ss) = datetuple
   nonzero = lambda n: n!=0
   date = "%04d-%02d-%02d" % (y,m,d) if filter(nonzero,(y,m,d)) else ''
   return date

#Clean the excel formatting
def format_excel_value(book, type, value):
   if type == 2: # NUMBER
      if value == int(value): value = int(value)
   elif type == 3: # DATE
      datetuple = xlrd.xldate_as_tuple(value, book.datemode)
      value = exceldate_to_isodate(datetuple)
   elif type == 5: # ERROR
      value = xlrd.error_text_from_code[value]
   elif type == 1 and '\n' in value: #There was some \n characters in my files that i needed to remove
      value = value.replace('\n',' ')
   return value

#UTF-8 the strings
def utf8ize(l):
   return [unicode(s).encode("utf-8") if hasattr(s,'encode') else s for s in l]

#Creates the CSV
def create_csv(table, file_path, delimit):
   file = open(file_path, 'wb')
   csvout = csv.writer(file, delimiter=delimit, doublequote=False, escapechar='\\')
   csvout.writerows(map(utf8ize, table))

#Load the table1 CSV into the table1 table
def load_table1(file_path):
   sql = """LOAD DATA LOCAL INFILE '%s'
   INTO TABLE table1 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES
   (col1, col2, col3)
   """ % (file_path)
   execute_sql(sql)

#Load the table2 CSV into the table2 table
def load_table2(file_path):
#Using ';' as a delimiter since there are ',' in the rows
   sql = """LOAD DATA LOCAL INFILE '%s'
   INTO TABLE table2 FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n' IGNORE 1 LINES
   (col1, col2, col3)
   """ % (file_path)
   execute_sql(sql)

def execute_sql(sql):
   connection = MySQLdb.connect(host = "127.0.0.1", user = "root", passwd = "password", db = "test")
   cursor = connection.cursor()
   cursor.execute(sql)
   cursor.close
   connection.commit()
   connection.close

#Unzip the archive containing the excel files
re_zip = re.compile(r'(\.zip)$')
zip_filename = filter(re_zip.search, os.listdir(".")) #Using the script directory, but you could easily change that

#Should have only one zip file in the directory
z = zipfile.ZipFile(zip_filename[0])
names = z.namelist()
z.extractall()
z.close()

#Loads the xls data into the database and removes the files when we're done with them
for filename in names:
   if "table1" in filename:
      csv_name = 'table1.csv'
      table = sheet_to_dictionary(filename, 3)
      create_csv(table['sheet_data'], csv_name, ',')
      load_non_mandate(csv_name)
      os.remove(filename)
      os.remove(csv_name)
   if "table2" in filename:
      csv_name = 'table2.csv'
      table = sheet_to_dictionary(filename, 9)
      create_csv(table['sheet_data'], csv_name, ';') #There was some commas in the columns, preventing me to use a basic CSV
      load_time_reporting(csv_name)
      os.remove(filename)
      os.remove(csv_name)

os.remove(zip_filename[0])
#When the script is finished, data is loaded in the tables and all the files, including the zip are deleted, you could also
#archive them if you want

SQL Expert 101 3/X : DDL and DML

This article is the third of a series of articles aiming to prepare me (and you) to pass the 1Z0-047 Oracle Database SQL Expert.

This part covers DDL (like creating table, adding constraints) and DML (insert/update/delete data).
Maybe I should have done it in first but anyway let’s begin.

Summary :

  1. Using DDL Statements to Create and Manage Tables
  2. Manipulating Data

Read the full article »

RMAN

RMAN (Recovery Manager) is the the backup and recovery tool recommended by Oracle.

There are several ways to create RMAN jobs : EM interface, command-line and scripts.Before the technical stuff, we should learn the backup vocabulary.
Read the full article »