Hadoop Ecosystem Overview
Hadoop Ecosystem Overview
2
About Brillix
3
Who am I?
• Zohar Elkayam, CTO at Brillix
• DBA, team leader, instructor and a senior consultant for over 17 years
• Blogger – www.realdbamagic.com
4
Big Data
"Big Data"??
Different definitions
“Big data exceeds the reach of commonly used hardware environments
and software tools to capture, manage, and process it with in a tolerable
elapsed time for its user population.” - Teradata Magazine article, 2011
“Big data refers to data sets whose size is beyond the ability of typical
database software tools to capture, store, manage and analyze.”
- The McKinsey Global Institute, 2012
6
A Success Story
8
More success stories
9
MORE stories..
• Crime Prevention in Los Angeles
• Astronomical discoveries
10
Examples of Big Data Use Cases Today
11
Most Requested Uses of Big Data
12
The Challenge
Big Data Big Problems
• Unstructured
• Unprocessed
• Un-aggregated
• Un-filtered
• Repetitive
• Low quality
• And generally messy
Oh, and there is a lot of it
14
The Big Data Challenge
15
Big Data: Challenge to Value
Deep Analytics
High Agility
Massive Scalability
Business
Value Tomorrow
Real Time
High Variety
Challenges High Volume
High Velocity
Today
16
Volume
• Big data come in one size: Big.
17
Some Numbers
• How much data in the world?
– 800 Terabytes, 2000
– 160 Exabytes, 2006 (1EB = 1018B)
– 4.5 Zettabytes, 2012 (1ZB = 1021B)
– 44 Zettabytes by 2020
18
Data grows fast!
19
Growth Rate
How much data
generated in a day?
– 7 TB, Twitter
– 10 TB, Facebook
20
Variety
21
Structured & Un-Structured
Un-Structured Structured
Objects Tables
22
Big Data is ANY data:
Unstructured, Semi-Structure and Structured
23
Data Types by Industry
24
Velocity
25
Global Internet Device Forecast
26
Internet of Things
27
Veracity
• Quality of the data can vary greatly
28
So, What Defines Big Data?
• When we think that we can produce value from that data
and want to handle it
29
Handling Big Data
Big Data in Practice
32
Big Data in Practice (cont.)
33
Infrastructure Challenges
34
Infrastructure Challenges (cont.)
• Storage:
– Efficient and cost-effective enough to capture and
store terabytes, if not petabytes, of data
– With intelligent capabilities to reduce your data
footprint such as:
• Data compression
• Automatic data tiering
• Data deduplication
35
Infrastructure Challenges (cont.)
36
Introduction To Hadoop
Apache Hadoop
38
Hadoop Creation History
39
Key points
• An open-source framework that uses a simple programming model to
enable distributed processing of large data sets on clusters of computers.
• The complete technology stack includes
– common utilities
– a distributed file system
– analytics and data storage platforms
– an application layer that manages distributed processing, parallel
computation, workflow, and configuration management
• Cost-effective for handling large unstructured data sets than conventional
approaches, and it offers massive scalability and speed
40
Why use Hadoop?
41
No, really, why use Hadoop?
• Need to process Multi Petabyte Datasets
• Expensive to build reliability in each application
• Nodes fail every day
– Failure is expected, rather than exceptional
– The number of nodes in a cluster is not constant
• Need common infrastructure
– Efficient, reliable, Open Source Apache License
• The above goals are same as Condor, but
– Workloads are IO bound and not CPU bound
42
Hadoop Benefits
44
Hadoop Components
Hadoop Main Components
48
HDFS Node Types
HDFS has three types of Nodes
• Namenode (MasterNode)
– Distribute files in the cluster
– Responsible for the replication between
the datanodes and for file blocks location
• Datanodes
– Responsible for actual file store
– Serving data from files(data) to client
49
Typical implementation
50
MapReduce is...
51
MapReduce paradigm
52
Typical large-data problem
53
Divide and Conquer
54
MapReduce - word count example
55
MapReduce Word Count Process
56
MapReduce Advantages
• Runs programs (jobs) across many computers
• Protects against single server failure by re-run failed steps
• MR jobs can be written in Java, C, Phyton, Ruby and
others
• Users only write Map and Reduce functions
57
MapReduce is good for...
58
MapReduce is OK for...
59
MapReduce is NOT good for...
60
Deep Dive into HDFS
HDFS
• Appears as a single disk
• Runs on top of a native filesystem
– Ext3,Ext4,XFS
• Based on Google's Filesystem GFS
• Fault Tolerant
– Can handle disk crashes, machine crashes, etc...
• Based on Google's Filesystem (GFS or GoogleFS)
– gfs-sosp2003.pdf
• http://static.googleusercontent.com/external_content/untrusted_dlcp/research.go
ogle.com/en/us/archive/gfs-sosp2003.pdf
– http://en.wikipedia.org/wiki/Google_File_System
62
HDFS is Good for...
• Storing large files
– Terabytes, Petabytes, etc...
– Millions rather than billions of files
– 100MB or more per file
• Streaming data
– Write once and read-many times patterns
– Optimized for streaming reads rather than random reads
– Append operation added to Hadoop 0.21
• “Cheap” Commodity Hardware
– No need for super-computers, use less reliable commodity hardware
63
HDFS is not so good for...
• Low-latency reads
– High-throughput rather than low latency for small chunks of
data
– HBase addresses this issue
• Large amount of small files
– Better for millions of large files instead of billions of small files
• For example each file can be 100MB or more
• Multiple Writers
– Single writer per file
– Writes only at the end of file, no-support for arbitrary offset
64
HDFS: Hadoop Distributed File System
• A given file is broken down into blocks
(default=64MB), then blocks are
replicated across cluster (default=3)
• Optimized for:
– Throughput
– Put/Get/Delete
– Appends
• Block Replication for:
– Durability
– Availability
– Throughput
• Block Replicas are distributed across
servers and racks
65
HDFS Architecture
• Name Node : Maps a file to a
file-id and list of Map Nodes
• Data Node : Maps a block-id to
a physical location on disk
• Secondary Name Node:
Periodic merge of Transaction
log
66
HDFS Daemons
• Filesystem cluster is manager by three types of processes
– Namenode
• manages the File System's namespace/meta-data/file blocks
• Runs on 1 machine to several machines
– Datanode
• Stores and retrieves data blocks
• Reports to Namenode
• Runs on many machines
– Secondary Namenode
• Performs house keeping work so Namenode doesn’t have to
• Requires similar hardware as Namenode machine
• Not used for high-availability – not a backup for Namenode
67
Files and Blocks
68
HDFS Blocks
• Blocks are traditionally either 64MB or 128MB
– Default is 128MB
• The motivation is to minimize the cost of seeks as compared to
transfer rate
– 'Time to transfer' > 'Time to seek'
• For example, lets say
– seek time = 10ms
– Transfer rate = 100 MB/s
• To achieve seek time of 1% transfer rate
– Block size will need to be = 100MB
69
Block Replication
• Namenode determines replica placement
• Replica placements are rack aware
– Balance between reliability and performance
• Attempts to reduce bandwidth
• Attempts to improve reliability by putting replicas on multiple racks
– Default replication is 3
• 1st replica on the local rack
• 2nd replica on the local rack but different machine
• 3rd replica on the different rack
– This policy may change/improve in the future
70
Data Correctness
71
Data Pipelining
72
Client, Namenode, and Datanodes
74
Namenode Memory Concerns
• For fast access Namenode keeps all block metadata in-
memory
– The bigger the cluster - the more RAM required
• Best for millions of large files (100mb or more) rather than billions
• Will work well for clusters of 100s machines
• Hadoop 2+
– Namenode Federations
• Each namenode will host part of the blocks
• Horizontally scale the Namenode
– Support for 1000+ machine clusters
75
Using HDFS
Reading Data from HDFS
1. Create FileSystem
2. Open InputStream to a Path
3. Copy bytes using IOUtils
4. Close Stream
77
1: Create FileSystem
• FileSystem fs = FileSystem.get(new
Configuration());
– If you run with yarn command,
DistributedFileSystem (HDFS) will be created
• Utilizes fs.default.name property from configuration
• Recall that Hadoop framework loads core-site.xml which
sets property to hdfs (hdfs://localhost:8020)
78
2: Open Input Stream to a Path
...
InputStream input = null;
try {
input = fs.open(fileToRead);
...
• fs.open returns org.apache.hadoop.fs.FSDataInputStream
– Another FileSystem implementation will return their own custom
implementation of InputStream
• Opens stream with a default buffer of 4k
• If you want to provide your own buffer size use
– fs.open(Path f, int bufferSize)
79
3: Copy bytes using IOUtils
IOUtils.copyBytes(inputStream, outputStream,
buffer);
• Copy bytes from InputStream to OutputStream
• Hadoop’s IOUtils makes the task simple
– buffer parameter specifies number of bytes to
buffer at a time
80
4: Close Stream
...
} finally {
IOUtils.closeStream(input);
...
• Utilize IOUtils to avoid boiler plate code that catches
IOException
81
ReadFile.java Example
public class ReadFile {
public static void main(String[] args) throws IOException {
Path fileToRead = new Path("/user/sample/sonnets.txt");
FileSystem fs = FileSystem.get(new Configuration()); // 1: Open FileSystem
InputStream input = null;
try {
input = fs.open(fileToRead); // 2: Open InputStream
IOUtils.copyBytes(input, System.out, 4096); // 3: Copy from Input to Output
} finally {
IOUtils.closeStream(input); // 4: Close stream
}
}
}
$ yarn jar my-hadoop-examples.jar hdfs.ReadFile
82
Reading Data - Seek
83
Seeking to a Position
84
SeekReadFile.java Example
public class SeekReadFile {
85
Run SeekReadFile Example
$ yarn jar my-hadoop-examples.jar hdfs.SeekReadFile
86
Write Data
87
WriteToFile.java Example
public class WriteToFile {
public static void main(String[] args) throws IOException {
String textToWrite = "Hello HDFS! Elephants are awesome!\n";
InputStream in = new BufferedInputStream(
new ByteArrayInputStream(textToWrite.getBytes()));
Path toHdfs = new Path("/user/sample/writeMe.txt");
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf); // 1: Create FileSystem instance
FSDataOutputStream out = fs.create(toHdfs); // 2: Open OutputStream
IOUtils.copyBytes(in, out, conf); // 3: Copy Data
}
}
88
Run WriteToFile
90
MapReduce and YARN
Hadoop MapReduce
92
The MapReduce Model
93
MapReduce Programming Model
94
MapReduce in Hadoop (1)
95
MapReduce in Hadoop (2)
96
MapReduce Framework
97
MapReduce Framework
• Error Handling
– Failures are an expected behavior so tasks are
automatically re-tried on other machines
• Data Synchronization
– Shuffle and Sort barrier re-arranges and moves data
between machines
– Input and output are coordinated by the framework
98
Map Reduce 2.0 on YARN
99
MapReduce1 vs. YARN
100
MapReduce1 vs. YARN (cont.)
101
Daemons
• YARN Daemons
– Node Manger
• Manages resources of a single node
• There is one instance per node in the cluster
– Resource Manager
• Manages Resources for a Cluster
• Instructs Node Manager to allocate resources
• Application negotiates for resources with Resource Manager
• There is only one instance of Resource Manager
• MapReduce Specific Daemon
– MapReduce History Server
• Archives Jobs’ metrics and meta-data
102
Old vs. New Java API
• There are two flavors of MapReduce API which became known as Old and
New
• Old API classes reside under
– org.apache.hadoop.mapred
• New API classes can be found under
– org.apache.hadoop.mapreduce
– org.apache.hadoop.mapreduce.lib
• We will use new API exclusively
• New API was re-designed for easier evolution
• Early Hadoop versions deprecated old API but deprecation was removed
• Do not mix new and old API
103
Developing First
MapReduce Job
MapReduce
105
MapReduce
106
Map Reduce Flow of Data
107
First Map Reduce Job
• StartsWithCount Job
– Input is a body of text from HDFS
• In this case hamlet.txt
– Split text into tokens
– For each first letter sum up all occurrences
– Output to HDFS
108
Word Count Job
109
Starts With Count Job
110
1: Configure Job
• Job class
– Encapsulates information about a job
– Controls execution of the job
Job job = Job.getInstance(getConf(), "StartsWithCount");
• A job is packaged within a jar file
– Hadoop Framework distributes the jar on your behalf
– Needs to know which jar file to distribute
– The easiest way to specify the jar that your job resides in is by calling
job.setJarByClass
job.setJarByClass(getClass());
– Hadoop will locate the jar file that contains the provided class
111
1: Configure Job - Specify Input
112
Side Note – Hadoop IO Classes
113
1: Configure Job – Specify Output
TextOutputFormat.setOutputPath(job, new Path(args[1]));
job.setOutputFormatClass(TextOutputFormat.class);
• OutputFormat defines specification for outputting data from
Map/Reduce job
• Count job utilizes an implemenation of
OutputFormat - TextOutputFormat
– Define output path where reducer should place its output
• If path already exists then the job will fail
– Each reducer task writes to its own file
• By default a job is configured to run with a single reducer
– Writes key-value pair as plain text
114
1: Configure Job – Specify Output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
• Specify the output key and value types for
both mapper and reducer functions
– Many times the same type
– If types differ then use
• setMapOutputKeyClass()
• setMapOutputValueClass()
115
1: Configure Job
116
1: Configure Job
• job.waitForCompletion(true)
– Submits and waits for completion
– The boolean parameter flag specifies whether
output should be written to console
– If the job completes successfully ‘true’ is
returned, otherwise ‘false’ is returned
117
Our Count Job is configured to
118
1: Configure Count Job
public class StartsWithCountJob extends Configured implements Tool{
@Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "StartsWithCount");
job.setJarByClass(getClass());
119
StartsWithCountJob.java (cont.)
// configure output
TextOutputFormat.setOutputPath(job, new Path(args[1]));
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(
new StartsWithCountJob(), args);
System.exit(exitCode);
}
}
120
2: Implement Mapper class
121
2: Implement Mapper
public class StartsWithCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable countOne = new IntWritable(1);
private final Text reusableText = new Text();
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer tokenizer = new StringTokenizer(value.toString());
while (tokenizer.hasMoreTokens()) {
reusableText.set(tokenizer.nextToken().substring(0, 1));
context.write(reusableText, countOne);
}
}
}
122
3: Implement Reducer
123
3: Implement Reducer
public class StartsWithCountReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text token,
Iterable<IntWritable> counts,
Context context) throws IOException, InterruptedException {
int sum = 0;
124
3: Reducer as a Combiner
125
4: Run Count Job
126
Output of Count Job
127
$yarn command
128
Input and Output
MapReduce Theory
130
Map Reduce Flow of Data
131
Key and Value Types
132
Key and Value Types
133
WritableComparable<T>
Implementations
Hadoop’s Class Explanation
134
Implement Custom
WritableComparable<T>
• Implement 3 methods
– write(DataOutput)
• Serialize your attributes
– readFields(DataInput)
• De-Serialize your attributes
– compareTo(T)
• Identify how to order your objects
• If your custom object is used as the key it will be sorted
prior to reduce phase
135
BlogWritable – Implemenation
of WritableComparable<T>
public class BlogWritable implements
WritableComparable<BlogWritable> {
private String author;
private String content;
public BlogWritable(){}
public BlogWritable(String author, String content) {
this.author = author;
this.content = content;
}
public String getAuthor() {
return author;
public String getContent() {
return content;
...
...
136
BlogWritable – Implemenation
of WritableComparable<T>
...
@Override
public void readFields(DataInput input) throws IOException {
author = input.readUTF();
content = input.readUTF();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeUTF(author);
output.writeUTF(content);
}
@Override
public int compareTo(BlogWritable other) {
return author.compareTo(other.author);
}
}
137
Mapper
138
InputSplit
139
InputSplit
140
Combiner
141
Combiner Data Flow
142
Sample StartsWithCountJob
Run without Combiner
143
Sample StartsWithCountJob
Run with Combiner
144
Specify Combiner Function
145
Reducer
• Extend Reducer class
– Reducer<KeyIn, ValueIn, KeyOut, ValueOut>
– KeyIn and ValueIn types must match output types of mapper
• Receives input from mappers’ output
– Sorted on key
– Grouped on key of key-values produced by mappers
– Input is directed by Partitioner implementation
• Simple life-cycle – similar to Mapper
– The framework first calls setup(Context)
– for each key → list(value) calls
• reduce(Key, Values, Context)
– Finally cleanup(Context) is called
146
Reducer
147
Partitioner Data Flow
148
HashPartitioner
public class HashPartitioner<K, V> extends Partitioner<K, V> {
public int getPartition(K key, V value, int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
}
149
Custom Partitioner
public class CustomPartitioner
extends Partitioner<Text, BlogWritable>{
@Override
public int getPartition(Text key, BlogWritable blog,
int numReduceTasks) {
int positiveHash =
blog.getAuthor().hashCode()& Integer.MAX_VALUE;
//Use author’s hash only, AND with
//max integer to get a positive value
return positiveHash % numReduceTasks;
}
}
• All blogs with the same author will end up in the same reduce task
150
Component Overview
151
Improving Hadoop
Improving Hadoop
153
Noticeable Distributions
• Cloudera
• MapR
• HortonWorks
• Amazon EMR
154
HADOOP Technology Eco System
155
Improving Programmability
156
Pig
• “is a platform for analyzing large data sets that consists of a high-level
language for expressing data analysis programs, coupled with
infrastructure for evaluating these programs. “
• Top Level Apache Project
– http://pig.apache.org
• Pig is an abstraction on top of Hadoop
– Provides high level programming language designed for data processing
– Converted into MapReduce and executed on Hadoop Clusters
• Pig is widely accepted and used
– Yahoo!, Twitter, Netflix, etc...
157
Pig and MapReduce
• MapReduce requires programmers
– Must think in terms of map and reduce functions
– More than likely will require Java programmers
• Pig provides high-level language that can be used by
– Analysts
– Data Scientists
– Statisticians
– Etc...
• Originally implemented at Yahoo! to allow analysts to access
data
158
Pig’s Features
• Join Datasets
• Sort Datasets
• Filter
• Data Types
• Group By
• User Defined Functions
159
Pig’s Use Cases
160
Pig Components
• Pig Latin
– Command based language
– Designed specifically for data transformation and flow expression
• Execution Environment
– The environment in which Pig Latin commands are executed
– Currently there is support for Local and Hadoop modes
• Pig compiler converts Pig Latin to MapReduce
– Compiler strives to optimize execution
– You automatically get optimization improvements with Pig updates
161
Pig Code Example
162
Hive
164
When not to use Hive
165
Hive
166
Hive Metastore
167
Hive Architecture
168
1: Create a Table
169
1: Create a Table
170
2: Load Data Into a Table
171
3: Query Data
172
3: Query Data
173
Databases and DB Connectivity
174
HBase
175
When do we use HBase?
176
When not to use Hbase
177
HBase Example
Example:
create ‘blogposts’, ‘post’, ‘image’ ---create table
put ‘blogposts’, ‘id1′, ‘post:title’, ‘Hello World’ ---insert value
put ‘blogposts’, ‘id1′, ‘post:body’, ‘This is a blog post’ ---insert value
put ‘blogposts’, ‘id1′, ‘image:header’, ‘image1.jpg’ ---insert value
get ‘blogposts’, ‘id1′ ---select records
178
Sqoop
• Sqoop is a command line tool for moving data from RDBMS to Hadoop
• Uses MapReduce program or Hive to load the data
• Can also export data from HBase to RDBMS
• Comes with connectors to MySQL, PostgreSQL, Oracle, SQL Server and
DB2.
Example:
$bin/sqoop import --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' \
--table lineitem --hive-import
179
Improving Hadoop – More useful tools
• For improving coordination: Zookeeper
180
ZooKeeper
181
Flume
182
Oozie
183
Spark
Fast and general MapReduce-like engine for large-scale data processing
• Fast
– In memory data storage for very fast interactive queries Up to 100 times faster
then Hadoop
• General
– Unified platform that can combine: SQL, Machine Learning , Streaming , Graph &
Complex analytics
• Ease of use
– Can be developed in Java, Scala or Python
• Integrated with Hadoop
– Can read from HDFS, HBase, Cassandra, and any Hadoop data source.
184
Spark is the Most Active Open Source
Project in Big Data
185
The Spark Community
186
Key Concepts
Write programs in terms of transformations on
distributed datasets
Resilient Distributed Datasets Operations
• Collections of objects spread
across a cluster, stored in RAM or • Transformations
on Disk (e.g. map, filter, groupBy)
• Built through parallel • Actions
transformations
• Automatically rebuilt on failure
(e.g. count, collect, save)
187
Unified Platform
188
Language Support
189
Data Sources
• Local Files
– file:///opt/httpd/logs/access_log
• S3
• Hadoop Distributed Filesystem
– Regular files, sequence files, any other Hadoop
InputFormat
• Hbase
• Can also read from any other Hadoop data source.
190
Resilient Distributed Datasets (RDD)
191
Hadoop Tools
192
Hadoop cluster
193
Big Data and NoSQL
The Challenge
195
The Solution: NoSQL
196
Example Comparison: RDBMS vs. Hadoop
Typical Traditional RDBMS Hadoop
Updates Read / Write many times Write once, Read many times
Query Response Can be near immediate Has latency (due to batch processing)
Time
197
Hadoop And Relational Database
Relational Database
Best Used For: Best Used For:
198
The NOSQL Movement
199
NoSQL, NOSQL or NewSQL
200
Why NoSQL?
202
203
Is NoSQL a RDMS Replacement?
NO
Well... Sometimes it does…
204
RDBMS vs. NoSQL
Rationale for choosing a persistent store:
Relational Architecture NoSQL Architecture
High value, high density, complex Low value, low density, simple data
Data
Complex data relationships Very simple relationships
Schema-centric Schema-free, unstructured or
semistructured Data
Designed to scale up & out Distributed storage and processing
Lots of general purpose Stripped down, special purpose
features/functionality data store
High overhead ($ per operation) Low overhead ($ per operation)
205
Scalability and Consistency
Scalability
208
ACID Transactions (cont.)
– Consistency.
– Availability.
– Partition Tolerance.
210
CAP in Practice
211
NoSQL BASE
• NoSQL usually provide BASE characteristics instead of ACID.
212
Eventual Consistency
213
Types of NoSQL
NoSQL Taxonomy
Type Examples
Key-Value Store
Document Store
Column Store
Graph Store
215
NoSQL Map
Key
Value
Column
Store
Document
size
Database Graph
Performance DATABASE
Typical
RDBMS
SQL comfort zone
Complex
216
Key Value Store
• Distributed hash tables.
• Very fast to get a single value.
• Examples:
– Amazon DynamoDB
– Berkeley DB
– Redis
– Riak
– Cassandra
217
Document Store
218
Column Store
219
How Records are Organized?
Row 2
Row 3
Row 4
220
Query Data
Row 2
• Even when we query a single Row 3
column, we still need to read
the entire table and extract the Row 4
Select Col2
From MyTable
222
Graph Store
• Inspired by Graph Theory
• Data model: Nodes, relationships, properties
on both
• Relational Database have very hard time to
represent a graph in the Database
• Example:
– Neo4j
– InfiniteGraph
– RDF
223
What is Graph
• An abstract representation of a set of objects
where some pairs are connected by links.
• Object (Vertex, Node) – can have attributes like
name and value
• Link (Edge, Arc, Relationship) – can have attributes
like type and name or date
Edge
NODE
224
Graph Types
Edge
Undirected Graph NODE NODE
Edge
NODE NODE
Directed Graph
NODE
Pseudo Graph
NODE NODE
Multi Graph
225
More Graph Types
Weighted Graph 10
NODE NODE
Like
Labeled Graph NODE NODE
226
Relationships
ID:1
TYPE:G
ID:1 NAME:NoS
TYPE:F QL
NAME:alice
TYPE: member
Since:2012
ID:1
TYPE:F
ID:2 NAME:dafn
TYPE:M a
NAME:bob
227
228
Q&A
Conclusion
230