Showing posts with label OpenSolaris. Show all posts
Showing posts with label OpenSolaris. Show all posts

Saturday, January 25, 2025

Self-hosting OpenSolaris under qemu-system-sparc -M niagara

Back to the Niagara target which I haven't touched since 2017. Back then I was playing with the Machine Description files, but afair OpenSolaris and Debian had different expectations. Unfortunately I haven't described my experiments, so I don't remember how exactly I created these machine definitions. Actually they are documented in the FWARC 2005/115. But it's something which takes time to understand, in some chapters it looks like unidirectional Directed Acyclic Graph, in other chapter there are references to "fwd" and "back". Anyway I've uploaded whatever I produced to my GitHub. At least it's possible to have 1GiB RAM.

Having 1 GiB allows QEMU booting various OpenSolaris/Illumos distributions, for instance the last dilos release for sun4v (dilos-net-1.3.7.136-sparc64.iso), but let’s start with OpenSolaris ramdisk.snv-b77-nd.no-boot-time-network.gz which was released as a part of OpenSparc T1. It is optimized to the Niagara machine QEMU emulates. Particularly it has the hsimd driver for a RAM-Disk. The other illumos distributions don’t seem to have it, although it’s released under GPLv2, so it should be possible to build it anywhere.

The v9os and Tribblix distributions use bootarchive and have much more features, which makes them too heavy for QEMU with its current performance. Booting Tribblix takes ~1 Hour on my laptop. The sun4v emulation can definitely be significantly optimized, will get back to this topic later. 

As the name suggests, the image has no network support. So, let’s add it. Not having a network card seems to be a challenge, but then again back in the nineties I haven’t had a network card either. There is a serial line, and this is enough to start hacking. At my University we phoned a UNIX console, executed slirp and then started pppd.

Chapter 1.  Experimenting with networking

The gunziped snv_77 image can be mounted for instance on qemu sun4m emulation, or any physical machine. Maybe it can even be read/write mounted under Linux, but RHEL9/OL9 doesn’t have the ufs.ko driver out of the box, so I haven’t tried it.

Initially I planned to use the authentic slirp.10c.sol24sparc binary, but alas, it wasn’t archived, and my google-fu was not strong enough to find it anywhere. So, I used a Solaris 9 machine to compile the binary myself. It wanted to have some crypt libraries which are different between my Solaris 9 installation and OpenSolaris snv_77. I don’t need any encryption performing communication as long as I stay on localhost, so I simply did

sed -e 's#crypt#nocrypt#g' configure > configure-nocrypt &&  chmod +x .configure-nocrypt && ./configure-nocrypt && make
Let's see if it's ok for snv77:
# /usr/local/bin/slirp -P
Slirp v1.0.16 (BETA) 

Copyright (c) 1995,1996 Danny Gasparovski and others. 
All rights reserved. 
This program is copyrighted, free software. 
Please read the file COPYRIGHT that came with the Slirp 
package for the terms and conditions of the copyright. 

IP address of Slirp host: 192.168.186.100 
[none found] 
Your address is 10.0.2.15 
(or anything else you want) 

Type five zeroes (0) to exit. 

[talking PPP, 115200 baud] 

SLiRP Ready ...
Nice! At this point I killed the user socat session, and used socat for creating a virtual serial.
socat pty,link=/dev/snv77,raw UNIX:/tmp/snv77.sock
Then started a pppd and I swear I could hear a phantom modem connect sound. The “anything else” message brought me to an idea to specify a different address (I already have 10.0.2.15 on another interface), but el9 pppd failed to negotiate it with slirp. So, I kept 10.0.2.15 for the moment. (Later I changed it to be 10.0.5.15) Let’s see it the machine is reachable.
$ telnet 192.168.186.100 
Trying 192.168.186.100...
And nothing happens. And actually, where is this 192.168.186.100 coming from? Oh, it’s defined in /etc/hosts. The physical FPGA machine for this image would have this address on its network card. But I don’t have a network card. The only one out there is the loopback with 127.0.0.1, which cannot be used for obvious routing reasons. No problem, another loopback to the rescue:
# ifconfig lo0:1 plumb 
# ifconfig lo0:1 192.168.186.100 up
Is it reachable now?
# ifconfig lo0:1 plumb 
$ telnet 192.168.186.100 
Trying 192.168.186.100... 
Connected to 192.168.186.100. 
Escape character is '^]'. 
login: root 
Last login: Tue Jan 19 06:46:15 on console 
Sun Microsystems Inc.   SunOS 5.11      snv_77  October 2007 
#
Good. Telnet is nice, but is extremely inconvenient for transferring files. And since I’m on nineties trip, let’s use rsh for the authenticity. Luckily I still have OpenSolaris b77 dvd with rsh and all the necessary libraries. So I added rshd.in from the SUNWrcmdr package and /usr/lib/libcmd.so.1 from SUNWcsl.
$ rsh 192.168.186.100 
::ffff:192.168.186.100: Connection refused
What? Why? Actually this happens because rsh without command works like rlogin. Which talks to a totally different daemon on a different port (513 instead of 514). I think this is a violation of the main UNIX principle: one program should do just one thing. So, rsh went against the rules and where is it now?
rsh -l root 192.168.186.100 ls -l
This one just hangs.

In my previous post I was wondering if anyone used rsh to execute commands on remote hosts back in nineties. I was sure I had used to do it, but found no success reports on the Net.

I even thought that this was another evidence of the Mandela Effect: the only reference I could find was stating that sending commands over rsh did not work.

I’ve looked at the code and found that internally slirp acts as a proxy, executing another rsh and piping the data back to the client. So I simply removed the support for rsh, 

$  git diff
diff --git a/src/ctl.h b/src/ctl.h
index 4a8576d..3518fb3 100644
--- a/src/ctl.h
+++ b/src/ctl.h
@@ -3,5 +3,5 @@
 #define CTL_ALIAS      2
 #define CTL_DNS                3
 
-#define CTL_SPECIAL    "10.0.2.0"
-#define CTL_LOCAL      "10.0.2.15"
+#define CTL_SPECIAL    "10.0.5.0"
+#define CTL_LOCAL      "10.0.5.15"
diff --git a/src/tcp_subr.c b/src/tcp_subr.c
index c14755a..0049778 100644
--- a/src/tcp_subr.c
+++ b/src/tcp_subr.c
@@ -563,7 +563,7 @@ struct tos_t tcptos[] = {
          {0, 23, IPTOS_LOWDELAY, 0},   /* telnet */
          {0, 80, IPTOS_THROUGHPUT, 0}, /* WWW */
          {0, 513, IPTOS_LOWDELAY, EMU_RLOGIN|EMU_NOCONNECT},   /* rlogin */
-         {0, 514, IPTOS_LOWDELAY, EMU_RSH|EMU_NOCONNECT},      /* shell */
+/* don't         {0, 514, IPTOS_LOWDELAY, EMU_RSH|EMU_NOCONNECT},       shell */
          {0, 544, IPTOS_LOWDELAY, EMU_KSH},            /* kshell */
          {0, 543, IPTOS_LOWDELAY, 0},  /* klogin */
          {0, 6667, IPTOS_THROUGHPUT, EMU_IRC}, /* IRC */

and added the rsh ports to port-forward. This way, Solaris rshd.in immediately complained that

 Jan 19 14:22:11 t1-fpga-00 rsh[270]: [ID 521673 daemon.notice] connection from 192.168.186.100 (192.168.186.100) - bad port
Makes sense. Slirp is usually started as normal used, so it communicates from unprivileged ports, whereas rshd expects the port to be in range: 513-1023:
 bad_port = (port >= IPPORT_RESERVED ||
		port < (uint_t)(IPPORT_RESERVED/2));

Fine, let’s hack rshd.in and remove this check:

 ./gdb-sparc64-solaris  --write -q in.rshd 
Reading symbols from /home/tyom/snv77/prep-rsh/usr/sbin/in.rshd...(no debugging symbols found)...done. 
(gdb) set *(int *) 0x000139c4=0x01000000 
(gdb) quit

And after that I'm back to where I started. rshd.in doesn’t complain, “rsh -l ls” waits for something. Then it occurred to me that I recently my laptop to OL9. I checked iptables settings immediately after first encountering the hanging rsh process, and the ports were open. But you know what? RHEL9/OL9 use firewalld by default. And there indeed communication to the ports 513-1023 is not permitted by default, which totally makes sense.

But since I don’t expect to be hacked from my own VM, I’ve permitted these connections. And now

$ rsh -lroot 192.168.186.100 "uname -a" 
SunOS t1-fpga-00 5.11 snv_77 sun4v sparc sun4v

Woo-hoo! We’ve got a networked sparc64 OpenSolaris image which can be used in self-hosting mode. You can put there whatever files you want with

  cat filename | rsh -lroot 192.168.186.100 "cat >filename" 
Then shut it down,
#  init 5
And finally in the qemu monitor,
 (qemu) pmemsave 0x1f40000000 83886080 vdisk.ram

Where 83886080 is the current size of the virtual disk in bytes (can be checked with ls -l). The next time you boot with the vdisk.ram, you’ll find the changes done to the FS before "init 5" and "pmemsave". Indeed I don’t encourage anyone to use rsh, this was done just for fun. If you want to try it yourself, the next chapter describes how to use the image created by experiments described above. 

Chapter 2. Using the snv-with-slirp image

The image for the experiments below, currenly resides here. Unpack it first. 

Launching it can be done with 6 terminals, 5 run with user privileges and 1 with root: 

  1. qemu-system-sparc64 -M niagara:
     ./qemu-system-sparc64 -M niagara -L ../1GiB-snv_77/ -m 1
    024 -nographic -serial unix:/tmp/snv77.sock,server -drive if=pflash,readonly=on,file=../sparc-disks/snv-with-slirp

    At the end of the session the (qemu) prompt it can be optionally used to save RAM disc contents into a file (qemu) pmemsave 0x1f40000000 83886080 vdisk.ram 
  2. socat
     socat STDIO,raw,echo=0 UNIX:/tmp/snv77.sock
    
    This one is used as a temporary helper to submit the boot command, login, configure the IP address alias and start the slirp process
     boot -vV
    

    As soon as the login prompt appears
    SunOS Release 5.11 Version snv_77 64-bit
    Copyright 1983-2007 Sun Microsystems, Inc.  All rights reserved.
    Use is subject to license terms.
    os-io Ethernet address = 0:80:3:de:ad:3
    Using default device instance data
    mem = 1048576K (0x40000000)
    avail mem = 945700864
    root nexus = Sun Fire T2000
    pseudo0 at root
    pseudo0 is /pseudo
    scsi_vhci0 at root
    scsi_vhci0 is /scsi_vhci
    virtual-device: hsimd0
    hsimd0 is /virtual-devices@100/disk@0
    root on /virtual-devices@100/disk@0:a fstype ufs
    pseudo-device: dld0
    dld0 is /pseudo/dld@0
    cpu0: UltraSPARC-T1 (cpuid 0 clock 5 MHz)
    pseudo-device: devinfo0
    devinfo0 is /pseudo/devinfo@0
    Hostname: t1-fpga-00
    
    t1-fpga-00 console login:
    
    login as root and then
    ifconfig lo0:1 plumb
    ifconfig lo0:1 192.168.186.100 up
    echo + >/etc/hosts.equiv
    /usr/local/bin/slirp-1.0.16-no-rsh-emu -P "redir 1023 1023" "redir 1022 1022" "redir 1021 1021"
    
  3. After slirp is started, kill the temporary socat
    killall socat
    
    (beware that it would also kill your other socat processes of the user. If you have any, you should use something more clever than killall. I don’t, so it works for me) 

  4. As root:
     socat pty,link=/dev/snv77,raw UNIX:/tmp/snv77.sock & pppd local -detach /dev/snv77
    
    This socat connects the unix socket to pty, which is used by pppd.
     
  5. Optionally start telnet session. It is handy to see what is available out there.
     
  6. This one is used to execute rsh commands on the Solaris guest. This one can be used for scripting and file transfer.

Indeed it’s possible to do with much less than 6 terminals. QEMU can be connected directly to pty and booting/starting slirp can be done for instance by a pppd expect script, or by changing the guest init sequence. Having the 6 terminals just helps to have more control and makes it easier to debug if something breaks.

/Happy hacking

Saturday, February 20, 2016

Bad, bad cafe! (0xbaddcafe)

Debugging Solaris 10 boot I saw something interesting in an exception trace:

143368: Unaligned Memory Access (v=0034)
pc: 00000000f02421f8  npc: 00000000f02421fc
%g0-3: 0000000000000000 0000000000000001 0000000000000000 00000000edd00620
%g4-7: baddcafebaddcafe 0000000000002e7f 0000000000000000 00000000f0243de8 
%o0-3: 00000000018d46e0 0000000000000001 00000000ede8e7e1 0000000001213010

And indeed, this is not a random pattern. It's a helping hand from the great, wise Solaris engineers who cared to help the ancestors in finding problems with hardware and kernel modules:

opensolaris/usr/src/uts/common/sys/kmem_impl.h:
#define  KMEM_UNINITIALIZED_PATTERN      0xbaddcafebaddcafeULL

Looking at the OpenSolaris sources and Solaris documentation, there are more such helping patterns:

Uninitialized Data: 0xbaddcafe
Redzone: 0xfeedface
Freed Buffer Checking: 0xdeadbeef

They are described in the "Detecting Memory Corruption" chapter of Solaris Modular Debugger Guide, but did actually appear long before mdb.

Saturday, May 21, 2011

Now finally something new

Loading: /platform/sun4u/boot_archiveramdisk-root ufs-file-system
Loading: /platform/sun4u/kernel/sparcv9/unix
module /platform/sun4u/kernel/sparcv9/unix: text at [0x1000000, 0x10bf34d] data at 0x1800000
module /platform/sun4u/kernel/sparcv9/genunix: text at [0x10bf350, 0x12b5c7f] data at 0x1865e00
module /platform/sun4u/kernel/misc/sparcv9/platmod: text at [0x12b5c80, 0x12b5c97] data at 0x18bac30
module /platform/sun4u/kernel/cpu/sparcv9/SUNW,UltraSPARC-II: text at [0x12b5cc0, 0x12c2a37] data at 0x18bb2c0
SunOS Release 5.11 Version MilaX_0.3.2 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
os-io Ethernet address = 52:54:0:12:34:56
Using default device instance data
mem = 262144K (0x10000000)
avail mem = 154615808
...
Preparing live image for use
Hostname: milax
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run

Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode

May 20 07:26:42 su: 'su root' succeeded for root on /dev/console
-bash: /usr/sbin/quota: No such file or directory
Sun Microsystems Inc.   SunOS 5.11      MilaX_0.3.2     October 2008
-bash: /bin/mail: No such file or directory
-bash: id: command not found
-bash: id: command not found
-bash: [: !=: unary operator expected
(root@milax)# uname -X
System = SunOS
Node = milax
Release = 5.11
KernelID = MilaX_0.3.2
Machine = sun4u
BusType =
Serial =
Users =
OEM# = 0
Origin# = 1
NumCPU = 1

(root@milax)# uname -a
SunOS milax 5.11 MilaX_0.3.2 sun4u sparc sun4u
(root@milax)#


The missing files above are caused by the maintenance mode which in turn is caused by the missing network card. So the next steps are obvious:
- plug in a supported NIC
- find a customer for the sun4u emulation
- ???
- PROFIT!

PS. Btw, are there any live OpenSolaris/Illumos based live CDs newer than MilaX 0.3.2? The later MilaX versions seem to be PC-only.

Friday, April 22, 2011

Seen a usable #

Well sort of:

Loading: /platform/sun4u/ufsboot
Size: 330820+55556+67364 Bytes
SunOS Release 5.7 Version Generic_106541-08 [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1999, Sun Microsystems, Inc.
Ethernet address = 52:54:0:12:34:56
Using default device instance data
# uname -X
System = SunOS
Node =
Release = 5.7
KernelID = Generic_106541-08
Machine = sun4u
BusType = <unknown>
Serial = <unknown>
Users = <unknown>
OEM# = 0
Origin# = 1
NumCPU = 1
# ls
a           dev         kernel      opt         proto        tmp
bin         devices     lib        platform    reconfigure  usr
cdrom       etc         mnt         proc        sbin         var

So, the Solaris 2.6 and 7 kernels are functional in a minimal mode (the 2.5.1 and 8+ hang on device detection). The single user mode is not there yet though:

...
ld.so.1: internal: malloc failed
Killed
FATAL ERROR: / file system type "" is unknown
             Exiting to shell.
#

Nice is that Solaris engineers did a really good job making the OS robust. The message above comes from a branch in a script which has a following comment:

            # "this never happens" :-)

Expect the unexpected, well done!

Now, the bad news: MilaX 0.3.2, and probably other OpenSolaris distributions are eager to play with the E-Cache, which qemu doesn't emulate. If I find the way to tell MilaX that there is no cache, it probably would get it up to the command line too. The old Solaris versions had a '-n' boot parameter to switch the cache off, but MilaX says this boot parameter is invalid.

Gonna take a break now. (Happy Easter, everyone!) The next stop is a working single user mode. Stay tuned!

Saturday, January 23, 2010

OpenSolaris sources are beautiful

Trying to find the roots of the "hsfs_putpage: dirty HSFS page" error, I looked in the OpenSolaris source.

High Sierra is a pretty old and stable stuff, so it is possible that the code is similar to OpenSolaris.
I looked in debugger, and the function calls hierarchy looks pretty similar.

Now in the OpenSolaris source code there is a nice comment:

/*
* Normally pvn_getdirty() should return 0, which
* impies that it has done the job for us.
* The shouldn't-happen scenario is when it returns 1.
* This means that the page has been modified and
* needs to be put back.
* Since we can't write on a CD, we fake a failed
* I/O and force pvn_write_done() to destroy the page.
*/
if (pvn_getdirty(pp, flags) == 1) {
               cmn_err(CE_NOTE,
                           "hsfs_putpage: dirty HSFS page");

The bright side: I don't know any other open source project which would be so nicely documented. The description confirms the suspect I had: it's the problem with MMU emulation.

The dark side:  it's not just the problem with hsfs. Other file systems will have this bug too, and there it must be even more dramatic: they must be constantly writing cache data back to disk.

The 100% mmu & mxcc emulation in qemu would make the memory access very slow. I still hope we can avoid this, but don't know how.

Sunday, August 23, 2009

Sun Studio for free

Currently there are two options to get Sun Studio for free:

- Everyone can have Sun Studio 12 update 1. There are Solaris/sparc, Solaris/intel and Linux/i686 versions. There seems to be compatibility issues with ld on newer linux distributions. The error message reads "libm format not recognized". The half official solution is

rm /opt/sun/sunstudio12/prod/lib/amd64/ld
ln -s /usr/bin/ld /opt/sun/sunstudio12/prod/lib/amd64/ld

Also there are problems with headless install under Linux. But it is possible to extract all the rpms with the --extract-installation-data command line option.

- OpenSolaris developers may get the version 10 here. But only the Solaris versions, not the Linux one. I wonder why would they need to mess with the older version 10, as there is a shiny new 12u1? Are there any known compatibility issues in the 12u1?