This automated message is provided for the sanity and convenience of the Global Academics Research Facilities personnel

From one scientist to another (neither of them me):

—BEEP–

Good morning Dr [redacted], and welcome to the Global Academics Mutual Support and Self-Help system. This automated message is provided for the sanity and convenience of the Global Academics Research Facilities personnel. The time in the UK is 10:47 A.M. Current outside temperature is roughly 18 degrees Celsius, but feels like 8 thanks to high winds and a very exposed location. The estimated high for today is 24 degrees Celsius, which will probably feel like 12 for the very same reasons. The indoor areas are maintained at a chilly 15 degrees at all times.

This message is designed to ease your transition into the post-doc world. If your intended goal is to stay in research, you will need to return to your office and remind your bank that it’s no use waiting for more money to pass through your account, as no more will be forthcoming. If you have not yet handed in your brain at the door after burning it out during the thesis writing process, you must report to the the medical service to have it replaced before you will be permitted into the high paper rate branch of the international research system.

A reminder that the Global Academics Board Game Night decathlon will commence this evening at 1900 hours around a table of your choice. The semi-finals for high paper rate personnel will be announced in a separate secure access transmission. Remember, more lives than your own may depend on your ability to tell a garden-variety zombie from a minion of Shub-Niggurath.

Do you have a friend or relative who would make a valuable addition to the Global Academics team? Tell them to run. A mile. Without looking back. Please contact your nearest university for further information. If you have an associate with a background in the areas of “Experimental Theology”, “Creative Geography”, or other sanity-questioning disciplines, please tell them they’d fit right in. The Global Academics Research Facility offers equally depressing opportunities for all employees.

A reminder to all Global Academics personnel. Regular light sensitivity and hemoglobin level screenings are a requirement of continued employement in the Global Academics Research Facility. Missing a scheduled light sensitivity or hemoglobin level check-up is grounds for immediate termination. If you feel you’ve been exposed to an excess of bright sunlight or other hazardous situations in the course of your duties, contact your vampire anti-defamation league representative immediately. Work lots, work nights. The future of your research may depend on it.

Now concluding the automated mutual support and self-help system message. Please have a seat, relax, and breathe deeply. Before getting started, be sure to check your office for zombie intruders. Thank you, and have a very sane, and paper-producing day.

—BEEP–

Library bookmark redesign

Here’s a little hobby project that I’ve been working on at the CERN Central Library. Instead of the familiar blue bookmark with only a title, the idea is to add anything that can be useful to library guests (and even strangers) that will fit inside the space of the bookmark. Hopefully it can give more people an idea of what we can do for them, what they can do on their own, how to find us and provide a simple way to send feedback.

References:

Work overview (italicized items are unfinished or undecided):

  • Title
    • Replace “CERN” with the official logo? Doesn’t look very good, and we wouldn’t be respecting the 25% margin rule.
  • Logos:
    • CERN
    • Library?
  • Due date, style:
    • Date stamp
      • Pros: Very fast to mark, usually good readability
      • Cons: Need to get one :)
    • Circle month + date
      • Pros: Fast to mark, medium readability
      • Cons: Computer science-y?
    • Text boxes
      • Pros: Easy to distinguish month from day
      • Cons: Legibility depends on the writer, slow, visual noise
    • Open space
      • Pros: Clean design with open space
      • Cons: Legibility depends on the writer, slow, could be too close to other text
  • Location code 52 1-052
  • Main web page
  • Email address
    • Desk
    • ILL?
  • Phone
  • Fax
  • Opening hours
  • Staffed hours
  • Other links:
  • CERN map:
    • Buildings
    • Main roads
    • Landmarks
      • Logos where possible, text elsewhere
      • Library
      • Restaurants
      • Reception
      • Entrances
      • P1
      • ATLAS
      • Globe
      • Hostels
    • Walking directions inside buildings
    • North arrow
  • Library map:
    • Both floors
    • Desk
    • Shop
    • Computers
      • Mark OS with logos
    • News shelf
    • Number ranges for shelves
    • Outdoor area
    • A/V equipment
    • Theses drawers
    • Reference section
    • Paper cutter
    • Printers
    • Copy machines
    • Return / delivery box
  • Feedback form
  • CC-BY license

Query CERN LDAP from the shell

Here’s one for the shell nuts:

sudo $EDITOR /etc/ldap/ldap.conf

Add the following line:

TLS_REQCERT never

Warning: This is not completely secure, since it ignores the certificate checks. There are instructions, but it’s not clear which of the two CA certificates I should use, and whichever I try, I get no useful feedback from ldapsearch even with -d 255. If you manage to use the certificates properly, I’d be grateful if you’d let me know how.

Now for the meat (replace $(whoami) with your CERN user name if it’s not the same as your login):

ldapsearch -v -H ldaps://ldap.cern.ch:636 -s sub -b O=CERN,C=CH -D cn=$(whoami),ou=users,o=cern,c=ch -x -W "(uid=$(whoami))"

Range arithmetic in Python

The XML 1.0 and 1.1 standards define some ranges of Unicode code points which are valid, and some “compatibility characters” which should not be used. CDS Invenio (a FOSS CMS) already has some code to clean up text to remove invalid characters, but it doesn’t remove the compatibility characters. Using the existing code for HTML 4.01 made the W3C Markup Validation Service complain, so I wanted to exclude the compatibility character ranges from the valid ranges, and get the most concise hexadecimal ranges corresponding to the resulting set to plug into a Python regular expression. Here’s the resultingsloppy and ugly code (I’ll post updated code and/or a link to the source repository if this is included at some point):

# -*- coding: utf-8 -*-
## Copyright (C) 2009 CERN.
##
## This file is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

"""Creates the minimal set of Unicode character ranges for valid XML 1.0 and 1.1
characters minus the compatibility changes"""

INCLUDE_XML10 = "#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] \
| [#x10000-#x10FFFF]"
EXCLUDE_XML10 = "[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF], \
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF], \
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF], \
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF], \
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF], \
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF], \
[#x10FFFE-#x10FFFF]"

INCLUDE_XML11 = "[#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]"
EXCLUDE_XML11 = "[#x1-#x8], [#xB-#xC], [#xE-#x1F], [#x7F-#x84], [#x86-#x9F], \
[#xFDD0-#xFDDF], \
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF], \
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF], \
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF], \
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF], \
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF], \
[#x10FFFE-#x10FFFF]"

def cleanup(value):
    """Prepare string for conversion to hex ranges
    @param value: String with ranges
    @return: String with ranges"""
    return value.replace('#', '0').translate(None, '[]')

def list_to_range(value):
    """Convert a list of strings (ranges and not)
    @param value: List of strings corresponding to hexadecimal numbers and
    ranges
    @return: List of numbers"""
    result = []
    for item in value:
        if item.find('-') == -1:
            result.append(int(item, 16))
        else:
            numbers = [int(hex_str, 16) for hex_str in item.split('-')]
            result.extend(range(numbers[0], numbers[1] + 1))
    return result

def range_minus(include_range, exclude_range):
    """Subtract one range from another
    @param include_range: String from http://www.w3.org/TR/xml/#charsets or
    http://www.w3.org/TR/xml11/#charsets
    @param exclude_range: Ditto
    @return: String with hex numbers and ranges"""
    include_range = cleanup(include_range)
    includes = include_range.split(' | ')

    exclude_range = cleanup(exclude_range)
    excludes = exclude_range.split(', ')

    include_numbers = list_to_range(includes)
    exclude_numbers = list_to_range(excludes)

    numbers = set([
        number for number
        in include_numbers
        if number not in exclude_numbers])
    lows = [
        number for number
        in numbers
        if number - 1 not in numbers]
    highs = [
        number for number
        in numbers
        if number + 1 not in numbers]

    result = zip(lows, highs)

    result_hex = [
        '\\U%0*X-\\U%0*X' % (8, pair[0], 8, pair[1])
        for pair in result]
    result_hex = [
        text.replace('-' + text[:10], '')
        for text in result_hex] # Single ranges

    result_hex = [
        text.replace('\\U0000', '\\u')
        for text in result_hex] # Shorten where possible

    return '\n'.join(result_hex)

print 'XML 1.0:\n' + range_minus(INCLUDE_XML10, EXCLUDE_XML10) + '\n'

print 'XML 1.1:\n' + range_minus(INCLUDE_XML11, EXCLUDE_XML11)

In case you just want the results, here you go:

XML 1.0:
\u0009-\u000A
\u000D
\u0020-\u007E
\u0085
\u00A0-\uD7FF
\uE000-\uFDCF
\uFDF0-\uFFFD
\U00010000-\U0001FFFD
\U00020000-\U0002FFFD
\U00030000-\U0003FFFD
\U00040000-\U0004FFFD
\U00050000-\U0005FFFD
\U00060000-\U0006FFFD
\U00070000-\U0007FFFD
\U00080000-\U0008FFFD
\U00090000-\U0009FFFD
\U000A0000-\U000AFFFD
\U000B0000-\U000BFFFD
\U000C0000-\U000CFFFD
\U000D0000-\U000DFFFD
\U000E0000-\U000EFFFD
\U000F0000-\U000FFFFD
\U00100000-\U0010FFFD

XML 1.1:
\u0009-\u000A
\u000D
\u0020-\u007E
\u0085
\u00A0-\uD7FF
\uE000-\uFDCF
\uFDE0-\uFFFD
\U00010000-\U0001FFFD
\U00020000-\U0002FFFD
\U00030000-\U0003FFFD
\U00040000-\U0004FFFD
\U00050000-\U0005FFFD
\U00060000-\U0006FFFD
\U00070000-\U0007FFFD
\U00080000-\U0008FFFD
\U00090000-\U0009FFFD
\U000A0000-\U000AFFFD
\U000B0000-\U000BFFFD
\U000C0000-\U000CFFFD
\U000D0000-\U000DFFFD
\U000E0000-\U000EFFFD
\U000F0000-\U000FFFFD
\U00100000-\U0010FFFD

Throw the fucking switch, Igor!

What would you like to have said, if it were you behind the “big green button” when the LHC starts in 2008? It’s the world’s most powerful particle accelerator, said to be the most complex machine ever built, and will most likely set the stage for the next level of theoretical physics, so it had better be in the “One small step” category.