Migrating to Community-driven Infrastructure
Introduction
[up to date as of 2013-02-08]
Albeit Nokia's plans about discontinuation of maemo support been known since spring 2012, Nokia gave "Go" to Nemein (service provider on behalf of Nokia) for the real migration work not earlier than 2 weeks before christmas 2012.
As of January, 18th 2013 the *.maemo.org infrastructure has been consolidated from a 20+ physical servers (aka "irons") to current config and completely migrated to new locations independant of Nokia servers. This task been accomplished by Nemein. Talk.maemo.org forum been integrated with the other infra, many thanks to Nemein for donating the VM for that. Also many thanks to Nemein for this incredible piece of work done during times when others (as well as the guys there) usually are already away for winter holidays.
The current setup (see below) consists of around 10 Virtual Machines hosted by Nemein on their xen-grid. This is an interim solution. Nokia paid Nemein for this consolidation/migration and hosting until end of February.
Handing over control of servers still pending, right now (2013-01-30) it's still Nemein and affiliates to control that infra.
Transfer of control over (*.)maemo.org DNS entries ("the domain") is still being negotiated between Nokia and HiFo, all DNS changes done so far been done by Nokia dnsmaster on Nemein's request
The plans of council and HiFo board so far are: kindly ask Nemein to have *.maemo.org nicely bundled. We hope for this setup to be free of major known bugs (I.E. autobuilder working, repository working albeit maybe slow) when Nemein hands us the package.
[2013-02-08] Negotiations about direct migration to one of our 3 options (see below) are ongoing.
further plans, state of migration
(obsolete. thus deleted. See wiki history if interested in what happened when)
This page is intended as a central place where status and other operational information can be gathered.
Plan for migration / Timeline [2013-03-15]
- Friday, 22.2. (falk)
- Rack Hardware @ IPHH - Hardware is racked
- Install base system (CentOS 6.3 with patches from xes)
- Saturday, 23.2. (xes/falk)
- Start migrating repository.m.o
- Start migrating VMs with static data
- ... (hidden DNS master set up)
- sync databases, switch DNS entries
- DNS switched [Nokia] to new IPs on 2013-03-14 1700UTC. Final sync established 1900. since then machines up and running on *new*
VMs we need to migrate:
Name | Disk Size | Location of act. instance | _migrated? | _Comments on *new* instance | static | 30G | nemein | synced+up | works | wiki | 20G | nemein | synced+up | works | repository | 900G | nemein | synced+up | We need to check the disk size, this might be too big for current hw, maybe split tablets-dev off. | 20G | nemein | synced+up | also has lists | scratchbox | 100G | iphh | setup! | will be setup new | vcs | 50G | nemein | synced+up | has NFS mounts from garage and repository (copying) | garage | 100G | nemein | synced+up | has NFS mounts from stage and vcs (copied, seems to work) | db | 100G | nemein | synced+up | works, needs tuning | builder | 50G | nemein | copied+up | still needs fixing several aspects | talk | 20G | nemein | synced+up | up since 2013-03-13, via HTTP-forward | dns | ?? | ipph | setup! | dns records/serial incomplete, bind inactive |
---|
State of final migration
all VMs got migrated to IPHH server, DNS still owned and managed by Nokia [2013-05-29]
Setup with IPHH
Networks
We have 2 /28 Subnets (213.128.137.0/28 and 213.128.137.16/28)
Networks are configured as follows:
IPv4 | IPv6 | VLAN | Xen Bridge | default GW | 213.128.137.0/28 | not yet | 1 | xenbr0 | 213.128.137.14 | 213.128.137.16/28 | not yet | 2 | xenbr1 | 213.128.137.17 | 10.0.1.0/24 | not yet | 3 | xenbr2 | 10.0.1.1 |
---|
IP Plan for vlan 1
IPv4 | IPv6 | Hostname | 213.128.137.1 | n/a | firewall-carp | 213.128.137.2 | n/a | firewall-a | 213.128.137.3 | n/a | firewall-b | 213.128.137.4 | n/a | blade-a | 213.128.137.5 | n/a | blade-b | 213.128.137.6 | n/a | portforwarding for monitor | 213.128.137.7 | n/a | 213.128.137.8 | n/a | 213.128.137.9 | n/a | 213.128.137.10 | n/a | 213.128.137.11 | n/a | 213.128.137.12 | n/a | IPHH Router 1 | 213.128.137.13 | n/a | IPHH Router 2 | 213.128.137.14 | n/a | IPHH-VRRP |
---|
IP Plan for vlan 2
IPv4 | IPv6 | Hostname | Aliases | 213.128.137.17 | n/a | firewall-carp | - | 213.128.137.18 | n/a | firewall-a | - | 213.128.137.19 | n/a | firewall-b | - | 213.128.137.20 | n/a | www | static, maemo.org, planet, downloads | 213.128.137.21 | n/a | wiki | bugs | 213.128.137.22 | n/a | repository | stage | 213.128.137.23 | n/a | lists | 213.128.137.24 | n/a | scratchbox | - | 213.128.137.25 | n/a | vcs | drop | 213.128.137.26 | n/a | garage | - | 213.128.137.27 | n/a | builder | - | 213.128.137.28 | n/a | talk | - | 213.128.137.29 | n/a | DNS | - | 213.128.137.30 | n/a | - | - |
---|
IP Plan for vlan 3
IPv4 | IPv6 | Hostname | 10.0.1.1 | n/a | firewall-carp | 10.0.1.2 | n/a | firewall-a | 10.0.1.3 | n/a | firewall-b | 10.0.1.10 | n/a | db | 10.0.1.11 | n/a | monitor | 10.0.1.200 | n/a | blade-a/IPMI | 10.0.1.201 | n/a | blade-b/IPMI | 10.0.1.202 | n/a | maemo-switch |
---|
Disk Layout of blade-[ab]
Both disks have the following partitioning:
RAID1 Volume for /boot (/dev/md0), consisting of /dev/sda1 and /dev/sdb1 (200M)
RAID1 Volume /dev/md1 consisting of /dev/sda2 and /dev/sdb2 (around 970G) The RAID1 Volume contains a physical LVM volume. We only have one VolumeGroup (vg_blade[ab]), which has LogVol00 with 20G as root volume, LogVol01 with 2 Gig as swap and vmstore with the rest as VM Storage mounted on /vmstore.
Tips & Tricks for migration
Copying:
Create an image on vmhost
fallocate -l 200g image.img
or, in case fallocate is unavailable
dd if=/dev/zero of=image.img bs=1 count=1 seek=200G
Attach as loop-device
losetup -f image.img(find the loop-device and create a filesystem on it)
Copy stuff
tar --create -p -j --one-file-system . | pv -br | ssh root@host 'cd /mountpoint ; tar xpj 'or
cd / ; rsync -arvSxz . root@host:/mount/point
Stuff to do [2013-03-15]
- Implement a proper service monitoring for all machines and applications - nagios pending, http://monitor.maemo.org/ganglia/
- Setup a common policy for root/user accounts and sudo permissions
- Change root-passwords - done
- Make SSH root-login key-only - done?
- Find out, what to sync for final migration - done
- Configure internal DNS server in /etc/resolv.conf
- Coordinate DNS setup with Nokia - partially done
- Consolidate Databases - WIP
- Add disks to system - done, 4TB on blade-a
- Setup bugtracking system for infrastructure - done: roundup?
- fix NFS mounts - WIP
- update VMs to 3.2.0-38
Problems we walked into
Machines throwing their network away
Apparently, XEN has issues if a vm sends too many/too large network packets.
http://lists.xen.org/archives/html/xen-devel/2013-01/msg00198.html has an interesting read about that problem.
Symptom:
xenbr1: port 8(vif51.0) entered forwarding state vif vif-51-0 vif51.0: Too many frags vif vif-51-0 vif51.0: fatal error; disabling device xenbr1: port 8(vif51.0) entered disabled state
in dmesg
Temporary fix: Disable all offloading on eth0
for i in rx tx sg tso gso gro lro; do ethtool -K eth0 $i off done
Source of this problem:
We fixed that problem on our machines by ensuring dom0 and domU use same MAX_SKB_FRAGS
Inventory
As a first step we gathered information about the former infrastructure at *.maemo.org. This "inventory" provided an overview about all components of the infrastructure as well as information that would later on aid during the migration.
The following topics were considered important for the migration:
- Legal Issues (Names, Trademarks, Domain Names, etc.)
- Infrastructure (Web Site, Forum, Wiki, Autobuilder, Mailinglists, Garage, etc.)
Legal Issues
What is the state about the name "Maemo"?
"... Maemo is currently a registered trademark of Nokia and the domain name is owned by Nokia.
Who owns "maemo.org"?
Negotiations about domain ownership still ongoing between Hildon Foundation board and Nokia (2013-01-20), if community can't get control over the DNS, we might revert to maemocommunity.org.
Domain ID:D105692361-LROR
Domain Name:MAEMO.ORG
Created On:07-Feb-2005 16:26:32 UTC
Last Updated On:07-Jan-2013 10:25:55 UTC
Expiration Date:07-Feb-2014 16:26:32 UTC
Sponsoring Registrar:MarkMonitor Inc. (R37-LROR)
Registrant ID:mmr-31461
Registrant Name:Nokia Corporation
Registrant Organization:Nokia Corporation
Registrant Street1:P.O.Box 226
Registrant Street2:Nokia Group
Registrant Postal Code:00045
Registrant Country:FI
Registrant Phone:+358.718008000
Registrant FAX:+358.718034496
Registrant Email:dnsauthority@nokia.com
We're planning to ask Nokia to allow a hidden primary [1] for maemo.org, that we will host on a persistent VM (dns) sponsored by Nemein (thanks Eero! :-D ). The purpose is to allow swift changes of IPs under maemo.org without bothering Nokia's DNSmaster, as long as the domain still belongs to Nokia. Once the domain will get transferred to HiFo, this will become less useful but also not exactly any problem. in 6 months or so we can consider tearing down the hidden primary and manage our domain directly.
What is needed for the community to run maemo.org?
TMO forums donated to Hildon Foundation: http://maemo.org/community/board/tmo_forums_donated_to_hildon_foundation/
What are the costs?
Nokia paid for hosting until end of February. Current (2013-01-30) interim config (VM on Nemein's xen-grid) will cost 1300EUR/month for the VM, plus 2200EUR/month for the maintenance. For the colocation rackspace, traffic, energy etc of the iron(s) Nokia donates to community there will be another 500+EUR/month. All excl VAT.
At end of February we hope to drop the xen-grid VM since they shall run in a virtualization on our iron by then.
If you're willing to donate, please visit http://hildonfoundation.org/support/
What about the personal information of the users?
Please refer to the privacy policy posted on the website. If you want info about what's the data stored about you inside *maemo.org, or want this data / your account getting permanently deleted, please contact council@maemo.org
Operational Platform
[2013-03-20] All of maemo.org is running on our supermicro server colocated at IPHH
List of hardware Nokia will donate to HiFo, according to Nemein's plans. [2013-02-08]
ID | Hostname | Mgmt IP Address | OOB Mgmt IP Address | Type (Virtual / Baremetal) | System Admin | HW Vendor | HW Model | Form Factor | CPU | Memory | Disk | Acquisition Date | Warranty | Services | Comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01 | blade-a.maemo.org | Baremetal | Falk(warfare) | Supermicro | http://www.supermicro.nl/products/system/2u/2027/SYS-2027TR-HTRF.cfm?parts=SHOW | 2U 19" Rackmount | Intel® Xeon® processor E5-2620 | 32GB | (raid1:2*)1TB, 2*2TB=4TB aux. | 3 years | Falk (for HH CoLo) | only 2 of the 4 blades populated | |||
02 | blade-b.maemo.org | Baremetal | Intel® Xeon® processor E5-2620 | 32GB | (raid1:2*)1TB |
OS and virtulization on community iron (planning, discussion)
Please don't forget to tag your contributions with your nick!
Server OS
alternative A
blabla-OS
alternative B
alternative C
Virtualization
alternative A
XEN (with OS blabla of above)
alternative B
VMware
alternative C
Services
The following table is intended to give a concise and easily perceivable overview of the *.maemo.org services. Please use the next sub-section for providing more detailed information.
Resource | URL (If Applicable) | Migration Status (DONE/WIP/NST) | Service Maintainer | System Admin | Software Name | Software Version | Software License | Known Issues | Last status update |
---|---|---|---|---|---|---|---|---|---|
Maemo Main Web Site | http://www.maemo.org | BUGS | ? | Nemein | orphaned links/404s: http://maemo.org/community/council/system_operator_needed/; Login doesn't work | 2013-01-25 | |||
Maemo Forums | http://talk.maemo.org | DONE | chemist, Reggie | Falk, chemist | vBulletin | Unlimited duration, no upgrades included, acquired on 2012-20-12 | Captcha image issues | 2013-02-10 | |
Maemo Wiki | http://wiki.maemo.org | BUGS | ? | Nemein | (Watch) Email not working; random connection timeouts | 2013-01-25 | |||
Repositories | http://repository.maemo.org | BUGS | X-Fade, Merlin1981 | Nemein | former akamai serverfarm, now points to stage.m.o VM master of farm. Hashsum errors legacy | 2013-02-20 | |||
Blog aggregator | http://planet.maemo.org | DONE | ? | Nemein | login flawed? | 2013-02-10 | |||
Maemo Garage | https://garage.maemo.org/ | DONE | ?, Woody | Nemein | 2013-01-25 | ||||
Maemo Autobuilder | NST | X-Fade | Nemein | OFFLINE, x-fade working on it | 2013-02-20 | ||||
Maemo Nameservers | WIP | Merlin, Falk | Nokia | Still using Nokia Nameservers; following hidden primary plan til domain transfer to HiFo established | 2013-01-25 | ||||
Drop | http://drop.maemo.org | WIP | X-Fade | Nemein | 2013-02-10 | ||||
VCS | http://vcs.maemo.org | WIP | Nemein | 2013-02-10 | |||||
Listserv | https://lists.maemo.org | BUGS | Nemein | occasional lockups resp interface down | 2013-02-20 | ||||
Static | http://static.maemo.org | WIP | Nemein | temporary fix via NAT port81 redir, instable? | 2013-02-20 | ||||
Stage | http://stage.maemo.org | obsolete | X-Fade | Nemein | VM got assigned to repository.m.o | 2013-02-20 | |||
Bugs | http://bugs.maemo.org | DONE | Andre | Nemein | - | 2013-01-25 | |||
Scratchbox | http://scratchbox.org/ | WIP | thedead1440 | Nemein, thedead1440 | 80.248.164.245, Logica Finland Oy, migration pending | 2013-02-20 | |||
Voting Infrastructure | ? | WIP | woody14619 | ? | ? | 2013-02-20 |
More Detailed Information
In this sub section more detailed information about the entries in the table can be placed. The intent is to keep the table concise while still being able to have all relevant information at hand.
List of VMs and their associated IPs:
IP adresses 188.117.59.198 test.maemo.org # www.maemo.org maemo.org 188.117.59.200 www.maemo.org 188.117.59.200 planet.maemo.org 188.117.59.200 static.maemo.org 188.117.59.199 drop.maemo.org 188.117.59.207 garage.maemo.org 188.117.59.204 lists.maemo.org 188.117.59.202 wiki.maemo.org 188.117.59.212 bugs.maemo.org # 188.117.59.203 repository.maemo.org scrubbed 188.117.59.205 stage.maemo.org repository.maemo.org (reassigned) 188.117.59.206 vcs.maemo.org
List of internal IP/VM
127.0.0.1 MaemoTemplate 10.0.0.1 maemo static maintenance 10.0.0.2 wiki bugs 10.0.0.121 stage repository 10.0.0.4 mail smtp lists 10.0.0.5 scratchbox 10.0.0.6 dns #10.0.0.7 repository 10.0.0.9 vcs drop 10.0.0.10 garage 10.0.0.11 db backup 10.0.0.12 builder 10.0.0.254 fw
Cpu Cores, RAM (in MB), storage (DISK, in GB), of the VMs
Current VMs actually in use (some more were reserved originally since it was not certain what services could be merged) Name C RAM DISK ------------------------ MaemoFW 1 1024 10 Builder 1 4096 150 garage 2 8192 100 test 2 2048 30 wikib 2 2048 50 www 2 6144 70 vcs 2 8192 200 db 2 8192 260 mail 2 2048 30 stage 2 2048 870 talk 2 4096 15 ======================== 20 48128 1785 sb 2 2048 30 dns 2 2048 30 ======================== 25 52224 1845
Forum (talk.maemo.org)
Unlike the other services, talk.maemo.org is not behind the endian firewall. Maintenence access is not via test jumpserver.
Software: vBulletin licence: Unlimited duration, no upgrades included, acquired on 2012-20-12
Scratchbox
Scratchbox is also sponsored by Nokia. (Please verify?) Scratchbox is required for running the Fremantle and Harmattan SDK.
Currently there's a VM on Nemein's xen-grid named "scratchbox", but state of the case is unclear.
Tracker for Sysops and Maintainers
This tracker is meant for maemo staff and affiliated only
web frontend: roundup.fourecks.de/maemo/ mail access (read docs!): maemo-issue AT fourecks.de
Service Maintainers (please update/augment/fix)
(please don't usually pester maintainers directly! First try to contact council@maemo.org, we'll forward)
These are the Service Maintainers (in spe), for services like forum (tmo), wiki, bugs, etc. They are (generally) not sysops of the machines their service is running on.
From | Nick | Full Name | Services Maintained | Status | Comments | Nemein | mashiara | Rambo Eero af Heurlin | eero.afheurlin at <to be disclosed by owner> | (sysop) | [leaving?] | Nemein | x-fade | Niels Breet | Niels<at>maemo.org | (mail, IRC, builder, ???...) | [leaving?] | Nemein | ferenc | Ferenc Szekely | ferenc<at>maemo.org | (mail, sysop, ???...) | [leaving?] | maemo | warfare | Falk Stern | falk<at>fourecks.de | (maemo master sysop) | maemo | chemist | Ruediger Schiller | webmaster<at>talk.m.o | Talk | maemo | merlin1991 | Christian Ratzenhofer | <at> | Repos | [preliminary accepted] | ??? | andre_ | Andre Klapper | ???<at>??? | Bugs | [???] | ??? (wiki) | (planet???) |
---|
Unsorted Hints
ssh access
All legacy accounts got ported to new infra.
Access to any VM is via plain direct ssh:
ssh <user>@<VM>.maemo.org
backup
we're doing backups to the 4TB auxiliary storage on blade-a, using backupPC:
ssh -L8088:localhost:80 blade-a konqueror http://localhost:8088
backup-master is Falk
talk VM sysop (chem|st) has access to it and control over own backups, via ssh confic on blade-a:
command="sleep 1d",permitopen="127.0.0.1:80" <ssh-pubkey>
Steering
council is in charge of any steering.
Joerg Reisenweber got appointed for "maemo.org infra administration coordinator" and thus is the single point of coordination for any detail questions.
If you got any questions, suggestions, critics, whatever, please contact Joerg (DocScrutinizer) or any other of council members via IRC. or send a mail to council AT maemo.org. We're just community's proxies acting in best intention to do what's probably community's best interest. If you don't agree with what we do or have suggestions how we could do better, please holler. Best place: Friday 1800UTC IRC:(freenode.net)#maemo-meeting
More
- OBS @ TiZen or SuSe : https://bugs.tizen.org/jira/browse/TINF-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
Autobuilder and friends
maemo autobuilder setup
autobuilder consists of multiple VMs
drop VM
this VM has /etc/passwd synchronised with garage and ~ folders mounted via NFS from garage
account synchronisation is handled by scripts running on garage VM and then sync is triggered using ssh and scripts in /usr/local/bin
packages are uploaded to /mnt/incoming-builder via SCP
garage VM
this is the VM where stuff happens
password/account sync to gforge/postgresql is done using
*/10 * * * * root /usr/local/bin/add_groups_users_git_ssh.sh > /tmp/add_groups_users_git_ssh.log dev/null 2>&1
this also updates ~/.ssh/authorized_keys
garage also handles web extras-uploader (/var/lib/extras-assistant/) - package is uploaded and then moved to the same folder as packages uploaded to drop and then chowned using
/var/lib/extras-assistant/bin/copy_package_files_to_autobuilder.sh
A lot of jobs on garage VM is done using local root crontab (/var/spool/cron/crontabs/root)
after package is uploaded it's processed by buildME
buildME runs as builder user and it's started from cron every minute
* * * * * builder /home/builder/buildme
buildme is configured using /etc/buildme.conf
buildme takes care of couple things
- verify that .tar.gz and other files are correct (checked using checksum from .dsc file)
- select free destination (buildme can handle parallel builds on multiple hosts/users)
- scp all required files to selected destination
- start sbdmock on the destination
- copy results back and resulting .deb to repository incoming folder (result_dir = /mnt/builder/%(product)s and repo_queue = /mnt/incoming/extras-devel/%(product)s/)
- send emails to list and user uploading package
builder VM
this VM has standard installation of scratchbox with no targets configured (it's not required for sbdmock)
when sbdmock is started it cleans up old build folder, creates new target and prepares build enviroment and then runs dpkg-buildpackage
sbdmock also generates logfiles that are parsed by buildme
repository/stage VM
this is where repository management happens
*/2 * * * * repository /home/repository/queue-manage-extras-devel.sh */5 * * * * repository /home/repository/queue-manage-extras.sh */5 * * * * repository /home/repository/queue-manage-community-testing.sh */5 * * * * repository /home/repository/queue-manage-community.sh
those scripts (and scripts inside /home/repository/queue-manager-extras) check for new packages in repository incoming folder and then move those to /var/repository/staging, regenerate Packages
(using sums that were previously cached) and sign it if required and then if any changes happened
#touch .changed file, so we know that we need to sync to live touch /var/repository/staging/community/.$dist.changed
this file is then checked by
1003 10634 1 0 Mar18 ? 00:00:00 /bin/sh /usr/local/bin/packages/rqp.sh
started by /etc/init.d/repository-qp
this script starts rsync when required to sync to live repository
this script also starts repository-queue-proc.php that processes repository updates coming from midgard (old package cleanup and promotions)